This is how abyssmal Ulrich's LACK of understanding of regression
problem is: Richard is thoroughly confused about
A LINEAR FUNCTIONAL model (in X) such as a straight line,
and a
LINEAR MODEL in regression which is linear in the PARAMETERS
of a regression model.
The above is why Ulrich STILL didn't recognize that
Y = a0 + a1 X + a2 X^2 + a3 X^3 + ... ap X^p
is a LINEAR multiple regression model while the functional model
in X is of course not linear.
This concept is so FUNDAMENTAL in all regression and linear
model courses in statistics beyond the first undergrad course in
regression analysis that Richard Ulrich's post can only be taken
as his OWN indictment of his own ignorance.
I am going to point out some references in the archives that had
already PROVED that Richard's blatent deficiency on the subject.
Richard Ulrich wrote:
> On 12 Dec 2005 14:37:22 -0800, "Reef Fish"
> >
> > junoexpress wrote:
> > > Hi,
> > >
> > > I have a simple question about multiple linear regression for a
> > > polynomial model.
>
> Bob "Reef Fish" >
> > That is just another multiple LINEAR regression model, although
> - hmm, that *is* what he said ...
>
> > it is sometimes labeled as a polynomial model because most
> - hmm, that *is* a useful descriptive term ...
>
> > people know what a polynomial is, and not many (proven in
> > the sci.stat.* groups) know what a LINEAR model is, in the
> > multiple regression context/definition.
>
> - Bob launches the slur upon us all. Again.
Slur? Here are some DOCUMENTED evidence, especially
starring the chief malpractice self-proclaimed "statistician"
Richard Ulrich:
The subject of "linear models and linear regression" had been
discussed at length in rec.stat.math.
The thread "What are LINEAR or LINEAR REGRESSION models?"
was a subthread that started on June 1 (post 111 of a google thread
to post 213 of June 14).
There were several OTHER related threads in June, including a
careful examination of the "definition" of a LINEAR model, and the
detailed explanation of various examples of LINEAR regression
models.
Richard Ulrich was the ONLY person who had ALL his answers
wrong! In the excerpts of my post of July 16 below, Richard used
the term "slur" as he is using now.
Ulrich> Here, I just respond to one ad-hominem slur.
> [ ...]
RF > Richard Ulrich does not even know what a LINEAR regression is,
when
RF > everyone else in the group except him and Bob O'Hara knew.
> [ ... ]
RF> That's an ad hominem slur? The FACT had been so thoroughly
RF> documented in various threads relating to the definition of LINEAR
RF> regression models that only the BLIND would not have known it.
<--------- excerpted from a post summarizing responses -------->
DEFINITION of Linear models>
PE Anon Bob Ulrich Dallal
6/14 6/10 6/10-11
(2) linear linear linear -- linear
(3) linear linear* linear* nonlin --
(4) linear <wrong> linear nonlin linear
(5) linear lin/non lin/non nonlin --
(6) linear linear linear nonlin linear
(7) linear same as (5) nonlin --
(8) linear linear linear nonlin linear
<------------------ end excerpt ------------------------------->
<spacing in the tabular form above may not line up when posted>
<for description of models (2) to (8) and explanations as to why they
are LINEAR, by standard definition, see several posts in the archives
such as: http://tinyurl.com/9k9ng >
RF> Richard Ulrich was the ONLY one whose responses were ALL WRONG.
<the models were ALL linear models, while Ulrich said "nonlinear">
RF> Richard Ulrich was also the ONLY person in the world who said
Ulrich> Y = b X is a NONLINEAR regression model.
RF> Now, what's ad hominem about the above FACTS in the archives?
For one whose knowledge about LINEAR models in regression is so
pitifully deficient, Richard Ulrich has the gall to post his piece
which
can only be described as the untranslatable Yiddish term "Chutzpah"
for the approx. translation of 'unadulterated gall'.
> The only authoritative statement of "linear models"
> with widely stated consensus seems to be that engineers
> do not allow polynomials.
The authors Kendall & Stuart, Neter, Wasserman, and Kutner, Cook
and Weisberg, Draper and Smith, Greybill, and numerous authors
of statistical textbooks had been mentioned and cited in the
extended discussion of LINEAR models, and ALL of those authors'
definition were IDENTICAL, all admitting a polynomial model to be
merely a special case of a linear model.
And all Ulrich can mutter is his three lines pointing to engineers
(without citing any) as his source of MISinformation, whereas
ALL the relevant information and DISCUSSIONS had already
been done (and easily retrievable by google) in the sci.stat.math
newsgroup! (during the period of April to June, 2005).
> I just want to point out that the members of the sci.stat.*
> groups have rather good facility in modeling, and with a
> wider scope that Bob considers to be philosophically allowable.
>
> Bob, himself, IMHO, has a rather limited facility in complex
> discussion -- in defining terms, in accepting when other people
> are addressing something other than his own focus.
It is a disgrace to the sci.stat.* newsgroups that someone as
uneducated (in statistical subjects) and had made as many
blunders as he had in the decade in which he BRAGGED about
his quackery and malpractice as his "contributions" -- had been
overlooked because Ulrich had led so many astray as he had
gone astray, without someone blowing the whistle on him until
I came alone this year (2005), and in a matter of a few short
months, Richard Ulrich had proven himself to be beyond a
shadow of a doubt that he is completely INCOMPETENT in
the theory AND practice of statistics.
>
> Here's my summary of what was occurring in the thread
> that Bob is citing. His response in the thread exemplifies
> what I just mentioned about his "focus".
>
> http://groups.google.com/group/sci.stat.math/msg/27d3a340437b4d3d?hl=en&
That was Richard Ulrich's defence(?) of what he called my ad hominem
slur on him when I pointed out
Ulrich> Y = b X is a NONLINEAR regression model.
which was only ONE among his SIX errors in the same post, and that's
not counting Richard Ulrich's errors in DOZENS of posts in the threads
on regression analysis, model building, and data analysis.
> [snip, rest]
Richard Ulrich, there is NO WAY you can snip away your footprints
(of the feet dangling from your mouth) in the archives of sci.stat.*
groups in which your ignorance and blunders had been thoroughly
EXPOSED -- including (that's how laughable Richard is) those
exposed by Richard HIMSELF -- such as the post to which I follow
up now.
-- Reef Fish Bob.
You made the statement because you primarily reside in
sci.stat.edu and sci.stat.consult
and I post most of my statistical substance in sci.stat.math, and
Richard Ulrich chose to fire HIS gratuituous shot in sci.stat.edu
in which the topic had not been discussed as it had been
thoroughly discussed in sci.stat.math!
Richard Ulrich was caught more than once for making his
gratuitous attack of me in sci.stat.edu OR sci.stat.consult when
he knew good and well I don't usually read those groups AND
the substance of his attack was only to PLAY TO THE CROWD
in the .edu and .consult groups.
These were Richard Ulrich's shots:
RU> - Bob launches the slur upon us all. Again.
RU> The only authoritative statement of "linear models"
RU> with widely stated consensus seems to be that engineers
RU> do not allow polynomials.
RU> I just want to point out that the members of the sci.stat.*
RU> groups have rather good facility in modeling, and with a
RU> wider scope that Bob considers to be philosophically allowable.
RU> Bob, himself, IMHO, has a rather limited facility in complex
RU> discussion -- in defining terms, in accepting when other people
RU> are addressing something other than his own focus.
That was why I pointed out to readers of all THREE groups that
Ulrich's attack was completely uncalled for, on his part, by pointing
out threads in sci.stat.math in which the subject was discussed
at length and IN DEPTH.
To respond to Ulrich's
RU> Bob, himself, IMHO, has a rather limited facility in complex
RU> discussion -- in defining terms,
I cited only ONE of my dozens of posts on the subject,
which not only was detailed and in depth, but also debunked ALL
of Richard Ulrich's self-serving excuses in his post as well as his
own absence of a valid defense in the post HE chose to cite:
> http://groups.google.com/group/sci.stat.math/msg/27d3a340437b4d3d?hl=en&
which was why I said,
RF> That was Richard Ulrich's defence(?) of what he called my
RF> ad hominem slur on him when I pointed out
Ulrich> Y = b X is a NONLINEAR regression model.
RF> which was only ONE among his SIX errors in the same post,
RF> and that's not counting Richard Ulrich's errors in DOZENS of
RF> posts in the threads on regression analysis, model building,
RF> and data analysis.
IMHSHO, you have fallen prey to Richard Ulrich's statistical quackery,
malpractice, and self-serving obfuscation on statistical matters for
TOO LONG, in the stat.edu and stat.consult and other .spss groups.
> Why don't you let this thread topic and tone die, it is becoming a
> burden.
Because it is my DUTY, as a professional statistician, to point out
to the unwary what errors and blunders have been frequently made
by Richard Ulrich, and where the CORRECT statistical substance
and discussions can be found, in the sci.stat.math groups.
As a matter of fact, the thread topic and tone WOULD have died
in sci.stat.edu (because I seldom ever read anything Ulrich posts
except when he chose to pick on what *I* posted, as he did).
In that respect, Richard Ulrich was begging to be EXPOSED, of
his ignorance, especially in the topics or regression, model-
building, and their proper applications.
That's also why (since I seldom EVER post anything in sci.stat.consult
except through cross-posting of some important warnings about quackery)
I concluded my post by stating:
RF> Richard Ulrich, there is NO WAY you can snip away your footprints
RF> (of the feet dangling from your mouth) in the archives of
sci.stat.*
RF> groups in which your ignorance and blunders had been thoroughly
RF> EXPOSED -- including (that's how laughable Richard is) those
RF> exposed by Richard HIMSELF -- such as the post to which I follow
RF> up now.
Richard Ulrich is the DEEP CANCER that has been growing and growing
in the sci.stat.* groups, for a decade, to the extreme detriment of the
entire
statistical profession!
You can call me "Deep Fish" if you like it better than my posting name
-- Reef Fish Bob.
Art
Art, you should know better! I thought you've been around enough
real statisticians to have learned NOT to condone the kind of noise
statmanz was making, not to mention your failure to recognize the
statistical quackery, heresy, and malpractice Richard Ulrich has
been peddling in these supposed STAT groups!
Shame on you, Art!
>
> stat...@earthlink.net wrote:
>
> > Why don't you let this thread topic and tone die, it is becoming a
> > burden.
Because it is my DUTY, as a professional statistician, to point out
to the unwary what errors and blunders have been frequently made
by Richard Ulrich, and where the CORRECT statistical substance
and discussions can be found, in the sci.stat.math groups.
As a matter of fact, the thread topic and tone WOULD have died
in sci.stat.edu (because I seldom ever read anything Ulrich posts
except when he chose to pick on what *I* posted, as he did).
In that respect, Richard Ulrich was begging to be EXPOSED, of
his ignorance, especially in the topics or regression, model-
building, and their proper applications.
Richard Ulrich is the DEEP CANCER that has been growing and
growing in the sci.stat.* groups, for a decade, to the extreme
detriment of the entire statistical profession!
-- Reef Fish Bob.
Ulrich's also to blame for the war in Iraq, AND hurricane Katrina.
Sorry, Bob Dole.
I always give credit where credit is due.
George Bush is to be credited (or blamed) for thre war in Iraq,
and God is to be credited (or blamed) for Katrina.
-- Reef Fish Bob.
"Re: Richard Ulrich EXPOSED his own Ignorance in LINEAR Regression
Models and other matters in Statistics!"
because I think Brett's comments (misunderstanding of my statements)
deserve a detailed explanation, in a serious discussion of an extremely
serious subject in the three sci.stat.* statistics groups.
Call this a "rant" if you wish. Since "imitation is the sincerest form
of
flattery", I am merely imitating what my mentor said of himself, on the
occasion of his receipt of an honorary Doctor of Science degree,
"I am an man of many words. If I were to speak extemporaneously,
I can probably hold myself spell bound for an hour ..."
because Statistics is the subject and the profession I loved, and
I am extremely sad to see the state it has fallen into, resulting in,
and as a result of, what I've witnessed in the past decades,
culminating in the SHOCK I received, when I read and participated
in the sci.stat.* forums.
Brett Magill wrote:
> Reef Fish wrote:
> >
> > Richard Ulrich is the DEEP CANCER that has been growing and
> > growing in the sci.stat.* groups, for a decade, to the extreme
> > detriment of the entire statistical profession.
!
> >
> > -- Reef Fish Bob.
> >
> Wow, I am disappointed to hear that the "entire statistical profession"
> is so weak as to be undermined by the "deep cancer" growing in USENET
> statistical discussion groups.
Your misinterpretation of my statement suggest some needed
clarifications.
It has been well-known in decades that the weakest part of the
statistical
profession consists of those who practice the use of statistics without
adequate training, mistaking the reading of a chapter from a computer
manual or a chapter of book in statistics as being suffice for them to
apply statistics properly and to advice others on statistical matters.
This phenomenon is very evident in all three of the sci.stat.* groups
in which the subject about Richard Ulrich was posted.
There is absolutely no question in my mind that Richard Ulrich was
inadequately and improperly trained in statistics. There are others
like him in these groups, EXCEPT:
1. The others are NOT as frequent as posters or act as freely in
advising others on subjects in which they are very ignorant.
2. Most of the others DON'T claim to be statisticians (e.g., Ross
and Afonzo to name just two), and no one takes them seriously.
But Richard Ulrich was and is taken seriously by more than a
handful of readers in these groups even when he was totally
wrong in his assertions or advice!
3. The WORST reflection about the "entire statistical profession"
is the FACT that Richard Ulrich has been peddling his statistical
quackery and malpractice for a DECADE, getting "thanks" from
some who needed sound advice for his mal-advice, while NOT
being corrected or having red flags raised by readers of these
groups about his malpractice.
(1) and (2) above contributed to the background of my statement, but
(3) is the MAIN reason for my statement which was NO, nor meant
to be, a hyperbole at all.
In these three groups, there ARE competent statisticians in academia,
in industry, and in other fields in which statistics is practiced.
Why
are THESE people not blowing the whistle on Richard Ulrich when
they SHOULD have, even though they are unable to rebut the
reasons I gave when Richard Ulrich was making, and repeating,
all kinds of 100% wrong statements about subjects as commonly
practiced as regression analysis?
Ulrich had made blunders about what is LINEAR regression which
had been thoroughly discussed and explained by me in sci.stat.math
about the statistical and interpretive meanings of multiple regression
coefficients; about the SIGNs of regression coefficients; and about
making causal inference on correlation and regression results in
which the data was obtained from uncontrolled observational
collection. In short, every POSSIBLE mistake that could be made
about multiple regression HAD BEEN made by Richard Ulrich
(which can easily be found in the Google archives).
Richard Ulrich didn't even know that the "normality assumption"
does NOT apply to the PREDICTOR or independent variables in a
multiple regression -- even after I had explained to the OP who made
the same mistakes.
I was WELL aware of the abuses of statistics (in PUBLICATIONS) by
many social scientests, economists, psychologists, and even some
who had been trained (and obviously badly) as statisticians, but even
having observed that REALITY in my decades of work in the
statistical profession (as professor, as editors and referees of
at least two dozen journals that published statistical applications,
as a statistical consultant) I was NOT prepared to see the kind of
abuse of statistics that had been ON-GOING in these discussion
groups, led in FREQUENCY at least, by the LEAST competent of
anyone I've ever met who CLAIMS to be a (trained) statistician.
The aggregate of ALL the reasons stated above is PART of the
reason for my indictment of the "entire statistical profession", as
reflected by what's happening in these newsgroups that are
supposed to be the forum for SERIOUS statistical discussions.
The "educators" and "academic institutions of higher learning" in
the subject of statistics are not exactly above reproach in my
global endictment of the statistical profession.
In May 2005, not long after I first started reading and participating
in sci.stat.math, I had already initiated a thread titled
" Educators or Salesmen Who Sold Their Souls to the Devil?"
partly as a result of the shocking abuse I've seen in these groups,
some of which MUST be blamed on the so called "educators" in
the USA. Before the end of May, the thread had reached 285
posts (according to the threaded count in google.groups.com).
This was post #267 in that thread:
in which I documented some of the abuses by "educators" and
"students" from my first hand observations. The university in
question is NOT one of a 2nd, 3rd, or lower tier institution or
any of the community colleges, but is one of the TOP ranked
research universities in the Top Tier of universities in the USA,
according the the World News ranking each year. I only shudder
when I try to imagine what's happening in the OTHER lower-tier
schools.
My INITIAL post of that thread was
in which (again) I gave my first hand account of what I observed
as the decline in the quality of the educational system in the USA,
since pre-1960, which was already far behind those of other major
countries, and even behind some third world countries, in science
and mathematics. It only got WORSE since then.
Richard Ulrich was the first to jump into that thread, disagreeing
with all my inference from my experience. I didn't know at the
time that Richard Ulrich turned out to be the SHINING example
of the PRODUCT (the worse of it) of the USA educational
system, because he not only learned little, but he was brash
enough to argue with me on every opportunity in which he was
caught red-handed in making statistical blunders in his posts!
> Perhaps THAT should shake one's faith in
> statistics, more than the content of the discussions themselves. Or,
> perhaps Reef Fish is just prone to hyperbole.
The lengthy response above should convey what I meant by my
statement (indictment) about the "entire statistical profession". To
summarize, if that point still hadn't been made clearly enough:
There are MANY contributing factors to the current ILLS of the
entire statistical profession, from those who TEACH the subject,
to those who PRACTICE the subject,.
I noticed in your participation in sci.stat.math group, you had
only one direct encounter with Richard Ulrich's patented style
of posting on subjects of which he is completely IGNORANT
while giving his gratuitous advice.
From: Brett Magill <magillb@*nomail*.sbcglobal.net>
Newsgroups: sci.stat.edu
Subject: Cluster Analysis Suggestions
Date: Mon, 29 Nov 2004 18:57:07 GMT
Brett> Hoping for some suggests or direction to appropriate resources.
(detailed explanation of Brett's questions snipped).
This was Richard Ulrich's response:
RU> I'm writing as someone who thinks clustering is usually foolish.
and proceeded to obfuscate on the subject "clustering", and
Richard Ulrich's post only showed unmistakably that he knows
NOTHING, not even to scratch the surface of that subject,
before he proceeded to name drop about Factor Analysis and
ANOVA, without addressing any of Brett's questions.
Ulrich ended a lengthy post of VACUOUS substance with his
suggestion for malpractice:
RU> Just doing the ANOVAs is probably a pretty good start for
RU> describing some 'clusters.'
For anyone who knows anything about cluster analysis or has
read any of the key references on the subject (such as Sokal's
book on Numerical Taxonomy to ANY of the dozens of books
and hundreds of published articles I've referenced on the
subject of "cluster analysis") would have seen how laughable
and completely inappropriate Ulrich's comment was, given
Brett's questions about Cluster Analysis!
To recapitulate still another response to Brett's comment:
> Perhaps THAT should shake one's faith in
> statistics, more than the content of the discussions themselves. Or,
> perhaps Reef Fish is just prone to hyperbole.
Let it be said, on record, and unequivally so, that there is
NOTHING I said that "should shake one's faith in statistics".
What I've said, implicit, if not explicitly , and OFTEN, is that there
are MANY statistical quacks in these newsgroups as well as
in the statistical profession at large who are ill-equipped to
practice statistics properly OR to recognize malpractice quacks
and snake oil salesmen when they see them.
Yes, Virginia. There ARE competent statisticians doing statistics.
They are few and far between.
Yes, Virginia. There are far more quacks who MALPRACTICE
statistics. Let the Buyer Beware.
What I have been protesting in these newsgroups is the same as
what I had said, in 1982, in the Journal of the American Statistical
Association, in my book review on "Correlation and Causation":
*> "I am less perturbed by the poor substantive quality of this
*> book than by the fact that we are witnessing the emergence of
*> a subculture of economists and social scientists, who are no
*> more qualified or equipped to practice statistics than law
*> or medicine, yet who nonetheless do practice it among
*> their circles of nonstatisticians, without much visible signs
*> of protest from the community of statisticians. I feel
*> obliged to register my strongest protest against this type of
*> malpractice, fostered by the title and content of this book."
23 years later, I need to change only "this book" to "many of the
articles posted in sci.stat.* newsgroups".
-- Reef Fish Bob.
> A 4 hour weather related flight delay in Cleveland gives me plenty of
> time to respond to the post of Richard Ulrich on "Polynomial regression
> analysis", in which Richard throughly EXPOSED and indicted himself
> on his ignorance about regression analysis in general, and LINEAR
> models in regression in particular.
>
> This is how abyssmal Ulrich's LACK of understanding of regression
> problem is: Richard is thoroughly confused about
>
> A LINEAR FUNCTIONAL model (in X) such as a straight line,
>
> and a
>
> LINEAR MODEL in regression which is linear in the PARAMETERS
> of a regression model.
No. Bob is thoroughly missing the point that
I was making -- originally, and again.
Who decides what to *call* a "linear regression model"?
The engineers (or so I gathered from earlier posts to the
net groups) include only what Bob gives a useful label
to, "linear functional model." One of the things I asked for
before -- if I remember correctly -- was something like,
"Give us some useful adjectives, instead of one
undifferentiated, absolute declaration."
Bob set out to push the margins. Or to match some Platonic
ideal inside his own head. I thought it would be more useful
to define what was being accepted *and* rejected in the course
of expanding a "definition" since that would (probably)
matter to someone else, eventually.
I regarded it as a scholastic pursuit of logic and consensus,
which would not entail labeling every other participant as
"wrong." And that is not out of "kindness." The use of terms,
as I see it even in many mathematical circumstances (though
not all), is a matter of consensus. I'm happy to let the engineers
use the same phrase, for instance. I expect other people to
draw some other, particular limits.
If I'm going to be called "wrong" for my statistics, I would
prefer that it be for something statistical, and not for Bob's
mis-reading of an attempt to improve a discussion.
[snip, a bunch more about Bob's definition of the "linear
regression model"; I wasn't a main participant. It was other
folks who complained that he was going too far.]
[snip, additional hostility towards me.]
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
> Who decides what to *call* a "linear regression model"?
Would you believe ALL the statisticians who have written statistical
textbooks about linear models and regression model?
> The engineers (or so I gathered from earlier posts to the
> net groups) include only what Bob gives a useful label
> to, "linear functional model."
Perhaps you should have learned your statistics from statisticians
instead of engineers. Besides, the "linear functional model" is
a mathematical term, used by statisticians to distinquish what is
the universally known LINEAR models (in the parameters) from
the functional models.
Richard, you are getting to be very tedious, boring, and obtuse
in your repetition of what is well-known in statistics.
>
> If I'm going to be called "wrong" for my statistics, I would
> prefer that it be for something statistical, and not for Bob's
> mis-reading of an attempt to improve a discussion.
>
> [snip, a bunch more about Bob's definition of the "linear
> regression model"; I wasn't a main participant. It was other
> folks who complained that he was going too far.]
It is very clear that in all your years of statistical malpractice,
you were NEVER a "main participant" in the subject of statistics.
You should have learned it. first from your statistical education,
and having failed to do so, you should have learned it from the
discussion in sci.stat.math from the post I gave (one among
dozens) that explained the subject in which you are still wasting
my time wallowing in your ignorance. This was the post you
snipped that summarized the discussion of statistical LINEAR
models:
Read it, and read it much more carefully this time.
RF> Richard Ulrich was the ONLY one whose responses were ALL WRONG.
<the models were ALL linear models, while Ulrich said "nonlinear">
Can you honestly blame it on the engineers rather than YOURSELF?
RF> Richard Ulrich was also the ONLY person in the world who said
Ulrich said> Y = b X is a NONLINEAR regression model.
Did you learn THAT from the engineers too? Richard, you should be
so ashamed of yourself that you should disappear for awhile so folks
on these newsgroups can forget the kind of blunder you have been
making.
Instead, you choose to return every few days, rehashing your SAME
errors -- which were BLATENT, and try to argue your way out of
something you have said on record that no one who has had ANY
training and understanding of regression methods should have
made such gross and ridiculous blunders as you have.
As I've said before, many times -- if you spend your time in making
posts such the one I am responding to, and LEARN the statistical
substance instead, you will be doing EVERYONE a favor, and the
one who needs it most is YOURSELF.
-- Reef Fish Bob.
Tautological as it might seem, that's a good way to put it, in ANY
discussion group, on the subject of statistics.
I am glad to see someone in the consulting group articulating some
aspects of the terminology that are usually confusing to those who
are not mathematically trained or/and poorly trained in statistics.
> And there's a particular reason for the (admittedly somewhat
> confusing) standard nomenclature, which comes down to the issue of
> tractability of the mathematics:
>
> Any polynomial regression model that is linear *in the parameters* and
> has errors that are normally and independently distributed, is an
> instance of a single, fairly simple and well understood mathematical
> model, whose parameters (along with their respective asymptotic
> covariances & p-values) can be easily estimated using non-iterative
> OLS methods. This is known as the general linear model.
I'll elaborate and expand on your points that are well put.
Actually there are TWO keywords in the definition of regression
models that have multiple meanings in different mathematical
contexts, and these words came from different sub-areas of
mathematics that happened to be useful in regression definitions
and models:
LINEAR INDEPENDENT
As we had seen, some people are confused by the term "linear"
in linear models and linear regression models, because the
term LINEAR there came from the areas of LINEAR ALGEBRA --
i.e., the linear independence of the basis vectors. That is why
the "independent variables" are so called, whereas economist
and people in some other disciplines fell more comfortable with
calling the "exogenous" or "predictor" variables.
But these variables are called INDEPENDENT variables because
they have to be LINEARLY INDEPENDENT -- a concept in Linear
Algebra.
Most folks are familiar with the consequences of violations of
the linear independence assumption/requirement: a singular
X'X matrix and the non-existence of LS solutions. For
mathematicians, a set of vectors is either linearly independent
or its not -- such as the definition of a "circle", or an "ellipse",
or a "virgin" for that matter -- it's either one or NOT one,
nothing in between.
But when these concepts and definitions are APPLIED,
especially when a computer often CANNOT tell if a matrix is
singular (because of roundoff), then the notion of something
"almost not linearly independent" -- together with its adverse
consequences, acquired the term "multicollinearity" which
strictly speaking mathematically, means linearly
DEPENDENT!!!
Thus, contrary to the heresy and misinformation given by
Richard Ulrich in these newsgroups, he was the one who
does not understand (or didn't know) the origin OR the
meaning of the term LINEAR in "linear models".
All he seems to know is that a polynomial is not a linear
FUNCTION of X. In fact, most of the multiple regression
models are NOT linear FUNCTIONS of its independent
variables. Otherwise, how would one fit curves, and
curve surfaces in high dimensions via multiple regression
techniques?
The OTHER keyword that is often confused by those
not mathematically trained is the word INDEPENDENT,
as mentioned above already, in the linearly independent
sense, arising form Linear Algebra.
But the other meaning of INDEPENDENT is one of
STOCHASTIC independence, in the probability sense!
In the probability models underlying a linear (regression)
model, the errors are usually assumed to be
stochastically (or statistically) independent (which
implies uncorrelated <but not vice versa>).
But stochastic independent is as different from linear
independence as a mouse is from a lion. Nevertheless,
there are good reasons for folks like Richard Ulrich to
be utterly and thoroughly confused in those concepts
because they never learned it in the first place, and are
too obtuse to learn it even about the LINEAR models
concept had been thoroughly explained.
Given your attempt to clarify the issue, I think what I have
further elaborated will help those to recognize (though
they still have to learn much about linear algebra and
mathematics to fully understand) by realizing that in the
regression definition and application, there are FOUR
completely different mathematical ideas, all of which are
involved:
1. The LINEAR independence of the independent variables
as basis vectors in the Linear Algebra sense.
2. The LINEAR model in multiple regression is in the sense
that the model is a linear COMBINATION of the parameters,
3. A LINEAR regression model is not always a linear
FUNCTIONAL model of the "indepdent variables" X's.
It is a linear FUNCTION only when it is a linear combination
of the Xs -- or the dependence of Y on X is on a hyperplane
in a multivariate space of X.
4. Finally, the errors in a regression model are STOCHASTICALLY
independent, in the random-variable and probability sense.
> Further generalizations, e.g., to non-normal distributions and
> correlated error terms, are less tractable and have smaller and more
> recent literatures. These are generalized linear models.
The other SIMPLE terms that are often confused are the terms
"multiple" or "multivariate" vs "univariate" in the definition of a
regression model -- I have THOSE distinction spelled out in my
Data Analysis Lecture Notes also, as those ideas about LINEAR
and INDEPDENT outlined above.
A MULTIPLE regression pertains only to the number of independent
variables in a UNIVARIATE regresssion (one with only ONE dep.
variable Y).
A MULTIVARIATE regression is one which has more than one
dependent variable Y, regardless of the number of independent
variables Xs in the model.
So, strictly speaking, the fitting of a so-called "polynomial model"
is a UNIVARIATE, MULTIPLE, LINEAR, regression model, with
one CARRIER. :-)
The term "carrier" was introduced by Tukey-Mosteller in their
book on how to count "how many" independent variables, because
x, x^2, ...x^p can be said to have p independent variables that are
linearly independent, but there is only ONE variable x -- the others
are merely carrying the information of x in different function forms!
To complete the full story of model definitions, there are
GENERALIZED multivariate multiple regression models. These
pertain to the generalization of multivariate regression models
that are standard material in textbooks on multivariate statistical
analysis. Those are far beyond the scope of the present
discussion of univariate, multiple, linear regression models.
-- Reef Fish Bob.
>
> On the other hand, models that cannot be specified linearly in the
> parameters can't be counted on to have by any of these relatively
> simple & well-known properties, and are therefore generally much more
> of a headache. That's why the definition.
>
> JW
Reefish's major point is "PUT YOUR BRAIN IN FIRST GEAR" before your fingers
operate in overdrive. I's much more fun to argue "STUPIDLY", then to act
profesionally and get your brain into the correct gear. The sci.stat series
is sounding just like a kindergarten argument on the playground. (Tis so,
tis not, tis so........) I have to conclude that "sci.stat..." is just a
bunch of kindergartners arguing. Total loss of professionalism.
Reefish's messages has forced me to do a lot of rethinking, and every time I
find him correct (except for the kindergarten garbage). Bob needs to "get
around his ego" and ignor all the garbage.
Bob is right on on the issue of linearity. Matrix operation/computation is
essentially a linear process, no matter how non-linear the numbers are.
Polynomial regression is by matrix operations, generating powers of the x
values as new variables. There are other ways to do these fits, but they are
not in commercial software.The argument that ALL students will be using
commercial software, forces a teaching process that teaches how to use it
rather than the mathematical theory and alternates.
Anybody who claims to speak for engineers in regard to statistics, ask them
how many years they have been READING TECHNOMETRICS and if they have read
Tukey and also Taguchi.
David Heiser
I think that is an unwarranted indictment.
>
> Reefish's messages has forced me to do a lot of rethinking, and every time I
> find him correct (except for the kindergarten garbage). Bob needs to "get
> around his ego" and ignor all the garbage.
I am not sure what you attributed to MY "kindergarten garbage",
but I'll be glad to put in this generic explanation of "quid pro quo"
in some of my defense to the attacks by Richard Ulrich's and a
few others' "kintergarten garbage".
I had stated many times that anyone is welcome to attack the
STATISTICAL substance of anything I've posted.
But how can I "ignor <sic> all the garbage"? I agree with anyone
about Richard Ulrich's posts being "kintergarten garbage", both in
statistical substance AND in his repeated attack on the basis of
his own ignorance.
Keep in mind that we ARE in an environment with MANY pre-
kintergarten readers, if we are going to use your kindergarten
characterization. Richard Ulrich had been PRAISED many
times for his FAULTY information and advice, and I've been flamed
even more times for pointing out RIchard Ulrich's ERRORS.
How can I ignore all these "kindergarten garbage" if I am discussing
posts in these sci.stat groups? All I can do is to CORRECT what
was wrong, as clearly as possible, and present what is RIGHT, at
a level that can be understood even by the untrained -- on
whatever that should have been obvious to the trained statisticians.
You said of my post STATISTICAL substance, (after all the noise,
flames, smoke, and kindergarten garbage),
DH> everytime I find him correct,
and the first line in DAH's paragraph below.
That's good enough for me. I have posted on at least 100 different
statistical topics in the few months I've participated in the sci.stat
groups
discussions, amidst kindergarten and pre-kindergarten posters.
If readers learned what's correct from my posts on those topics and
steer away from what I called "quackery", "black magic", and
statistical malpractice, that's more than I could expect!
But I simply find this bit of advice hard to take, and impossible to
follow,
in an environment heavily populated by statistical kindergarteners.
DH> Bob needs to "get around his ego" and ignor all the garbage.
If these discussion ngs were anywhere close to the editorial process
in a scholarly publication (such as journals of statistics and its
applications), I can EASILY ignore al the garbage by the authors
submitting papers that are garbage. I simply document my criticisms
what's wrong (as I also do in these newsgroups), and that would
be the end of the editorial process unless the author can provide
reasons and rebuttal to my criticisms to prove me wrong.
That had never happened (that I made an error of judgment in
my rejection of something that was said by me to be "wrong"),
not even once, in my 30 years of editorial service as referee or
associate editor in DOZENS of reputable statistical journals and
journals of statistical appications, on statistical methodology
and proper practice.
Why should I start posting statistiscal substance that is
"incorrect" simply because Richard Ulrich and a few others
do so said so?
Overall, in spite of my disagreement with the tone of David
Heiser's sweeping indictment of the sci.stat groups and his
gratuitous remarks about my role in "kindergarten garbage",
I am gratified to see his voice of reason in
STATISTICAL SUBSTANCE
and his testimony in what's correct and what's incorrect in some
of the threads of statistical topics; in speaking out against
malpracticing quacks, besides me having to simultaneously
1. Blow the whistle on malpractice;
2. Provide the correct substance not once, not twice, but DOZENS
of times, in response to the "kindergarten garbage" against my
correct assertion of statistical facts;
3. Defend myself against "kintergarten garbage" in an environment
in which kindergarten and pre-kindergarten statisticians abound.
I recall a previous encounter with David Heiser in these groups on
July 4, 2005 which makes the present theme deja vu in a way:
Reef Fish wrote:
> David A. Heiser wrote:
> > I think reef fish is Richard Chambers come back to haunt us.
> > DAH
> Richard Chambers, whoever he was, certainly did not
> successfully expose,educate, or reform, Richard Ulrich and
> his fellow Quacks and malpractitioners in sci.stat.* because
> they are still running amuck in these ngs peddling their
> Quackery, and selling their statistical snake oil.
RF> Geez, and I thought this "Richard Chambers" was some who knew
RF> something about statistics and "haunted" the Quacks when I made
RF> the above statement.
RF>
RF> It turned out that David Heiser meant William (Bill) Chambers,
RF> and Bill was just ANOTHER Quack like many of you here.
In the respect of being a sympathizer of someone who haunts
statistical quacks in sci.stat groups, I welcome David Heiser's
present post.
More non-kindergarteners in statistics in these groups should
speak out, and speak out often, on what's RIGHT and what's WRONG
on matters of statistical substance and statistical practice!
I thank you, David Heiser, in that respect, in your post.
-- Reef Fish Bob.
> Actually there are TWO keywords in the definition of regression
> models that have multiple meanings in different mathematical
> contexts, and these words came from different sub-areas of
> mathematics that happened to be useful in regression definitions
> and models:
>
> LINEAR INDEPENDENT
>
> As we had seen, some people are confused by the term "linear"
> in linear models and linear regression models, because the
> term LINEAR there came from the areas of LINEAR ALGEBRA --
> i.e., the linear independence of the basis vectors. That is why
> the "independent variables" are so called, whereas economist
> and people in some other disciplines fell more comfortable with
> calling the "exogenous" or "predictor" variables.
>
> But these variables are called INDEPENDENT variables because
> they have to be LINEARLY INDEPENDENT -- a concept in Linear
> Algebra.
< rest of well written message cut >
Bob,
This is a well written and educational message. It has helped me better
understand and appreciate the terms "linear" and "independent" in the
context of regression. Thank you for explaining it.
Now that you have posted a clear and thorough message on this subject, I
humbly suggest that you move on and give this topic and Richard Ulrich a
rest. I think if you apply your considerable statistical expertise to other
topics under discussion -- rather than focusing so much energy on this
particular topic -- you will be able to make a more valuable contribution to
the group. I know I would learn more from reading your explanations of
other statistical topics than hearing your criticisms of Richard.
--
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com (decision tree modeling)
http://www.nlreg.com (nonlinear regression)
http://www.LogRover.com (web traffic analysis)
http://www.NewsRover.com (Usenet newsreader)
[snip, some]
> Actually there are TWO keywords in the definition of regression
> models that have multiple meanings in different mathematical
> contexts, and these words came from different sub-areas of
> mathematics that happened to be useful in regression definitions
> and models:
>
> LINEAR INDEPENDENT
>
> As we had seen, some people are confused by the term "linear"
> in linear models and linear regression models, because the
> term LINEAR there came from the areas of LINEAR ALGEBRA --
> i.e., the linear independence of the basis vectors. That is why
> the "independent variables" are so called, whereas economist
> and people in some other disciplines fell more comfortable with
> calling the "exogenous" or "predictor" variables.
>
> But these variables are called INDEPENDENT variables because
> they have to be LINEARLY INDEPENDENT -- a concept in Linear
> Algebra.
Why are the variables called independent?
Bob offers an entertaining hypothesis, and I think that it
has mnemonic value, for remembering which is which.
I can't say that the subject has often arisen, but that is
not an explanation that I have heard before.
The use that I have always had in mind starts with
the notion that there can easily be *one* independent
variable, and one dependent variable -- That one-to-one
relation doesn't invoke linear independence of predictors.
This is a relation that invokes ideas of causation.
The first examples that I google tend to confirm that, as
does a formal definition.
===== From Answers.com, From the American Heritage
independent variable
n.
1. Mathematics. A variable whose value determines the value of other
variables.
2. Statistics. A manipulated variable in an experiment or study whose
presence or degree determines the change in the dependent variable.
===== end of definition.
- nothing in the definition about 'linear independence'.
[snip, interesting discussion]
Thanks Phil. I am gratified to hear that my message on this particular
topic is finally getting through to some who hadn't quite gotten the
message before.
> Now that you have posted a clear and thorough message on this subject, I
> humbly suggest that you move on and give this topic and Richard Ulrich a
> rest. I think if you apply your considerable statistical expertise to other
> topics under discussion -- rather than focusing so much energy on this
> particular topic -- you will be able to make a more valuable contribution to
> the group. I know I would learn more from reading your explanations of
> other statistical topics than hearing your criticisms of Richard.
I take you comment to be both sincere and constructive in intention,
but
I must take exception to what you what said about "focusing so much
energy on this particular topic" (linear models) and give "the subject
and
Richard Ulrich a rest" as if it was my fault that I have NOT given the
subject or Ulrich a rest.
The fact of the matter is -- I have given CONSIDERABLE effort to the
explanation of the "linear models" subject already, as evidenced by
several LENGTHY threads related to it, and the statement from an
appreicative participant who went from challenging and flaming me
at each step of the way. to his statement of appreciation of my
effort spent:
pes> The effort is appreciated and I find I'm now where I need to be.
pes> I'm still looking for other published examples, which are
pes> few and far between, but I do understand your argument.
pes>
pes> Thanks again.
This was in JUNE 2005, after I had posted at LEAST dozens of posts
in a dozen or so threads covering the SAME ground. when the light
finally came to prescaand, when he wrote the above, in the post:
after I conceded that I had run out of different ways of explaining the
same definition and concept:
RF> Perhaps someone ELSE who understands this "linear combination"
RF>and linear vs nonlinear (in the MODEL) concept can give you an
RF> alternative explanation to get you our of your rut of a mental
block.
RF> I have exhausted my ways of explaining the same thing about the
RF> LINEAR COMBINSTION aspect of the definition.
For MONTHS after that topic had been thoroughly explained and
discussed, Richard Ulrich CONTINUED to make noise about it,
including in this present thread, right after YOUR (Phil Sherrod)
post, and after David Heiser had explained the same concept to
Richard Ulrich and said to RIchard,
DAH> Reefish's major point is "PUT YOUR BRAIN IN FIRST
DAH> GEAR" before your fingers operate in overdrive.
and then proceeded to characterize the exchange between
Richard and myself as "kindergarten garbage".
Instead of telling ME to leave Richard Ulrich or any topic to rest,
I think it would be MUCH better if posters tell Richard Ulrich
directly, themselves, as David Heiser did. The deafening SILENCE
of the rest of the discussants (when Ulrich was DEAD WRONG)
and persisted in making his noise was what gave Ulrich the
encouragement ot continue making his noise!
If YOU and others would tell Richard Ulrich to shut up on the
topics in which he continued to beat his dead horses, and if he
pays heed to what OTHERS tell him, then the sci.stat groups
will be a much better place for ANY discussion of statistical
topics, and raise the level of the discussion above what
Heiser calls the "kindergarten garbage" and I call the
"pre-kindergarteners" who supported Ulrich's false information
and malpractice.
That's the way I see as the only way to make progress and
raise the level of the discussion -- get rid of the kindergarten
garbage continued to be peddled by Richard Ulrich and his
like (which are few and far between now).
-- Bob.
I do try to keep from repeating myself, and I do try
to make sensible points about statistical issues.
Bob is trying to encourage others to see *me* as a negative
influence in the groups; I think his view is shaded by his
long-standing distaste for most of the "social sciences"
- both in content and statistical practices. I'm familiar
with both, and he is not.
I thought that Bob was not-posting in sci.stat.edu because
the questions were pragmatic, and "social science".
On 20 Dec 2005 11:45:16 -0800, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
[snip]
>
> There is absolutely no question in my mind that Richard Ulrich was
> inadequately and improperly trained in statistics. There are others
> like him in these groups, EXCEPT:
>
> 1. The others are NOT as frequent as posters or act as freely in
> advising others on subjects in which they are very ignorant.
>
> 2. Most of the others DON'T claim to be statisticians (e.g., Ross
> and Afonzo to name just two), and no one takes them seriously.
> But Richard Ulrich was and is taken seriously by more than a
> handful of readers in these groups even when he was totally
> wrong in his assertions or advice!
- I've always welcomed questions, additions, editing, or corrections.
That's one reason to take me more seriously than one might
"take Bob", who pretends to have absolute answers, and who
somewhat regularly decides to hammer someone.
>
> 3. The WORST reflection about the "entire statistical profession"
> is the FACT that Richard Ulrich has been peddling his statistical
> quackery and malpractice for a DECADE, getting "thanks" from
> some who needed sound advice for his mal-advice, while NOT
> being corrected or having red flags raised by readers of these
> groups about his malpractice.
- They don't complain. That's because they agree with me?
===== April 19, Reef Fish Bob quoting me and agreeing, 90%.
http://groups.google.com/group/comp.soft-sys.stat.spss/msg/1a184eeb687f97dc?hl=en&
[me] >
> The overall conclusion that I draw, from this and his other posts,
> is that Bob has a conception of *proper* social science aims
> of multiple regression which varies widely from what social
> scientists think they can achieve.
> Bob may tell me whether this is a decent summary or not.
[Bob]
It is over 90% true. I am leaving 10% room for those social
scientists who do NOT abuse the use of statistics and statistical
methods, including [... snip, names, etc.]
======
>
> (1) and (2) above contributed to the background of my statement, but
> (3) is the MAIN reason for my statement which was NO, nor meant
> to be, a hyperbole at all.
>
> In these three groups, there ARE competent statisticians in academia,
> in industry, and in other fields in which statistics is practiced.
> Why
> are THESE people not blowing the whistle on Richard Ulrich when
> they SHOULD have, even though they are unable to rebut the
> reasons I gave when Richard Ulrich was making, and repeating,
> all kinds of 100% wrong statements about subjects as commonly
> practiced as regression analysis?
Um. I thought I saw plenty of agreement with me, and none
with Bob, on the discussion of regression. In fact, in the last
discussion or two, Jerry Dallal carried the banner, though it
used only some the keywords that Bob names here --
>
> Ulrich had made blunders about what is LINEAR regression which
> had been thoroughly discussed and explained by me in sci.stat.math
> about the statistical and interpretive meanings of multiple regression
> coefficients; about the SIGNs of regression coefficients; and about
> making causal inference on correlation and regression results in
> which the data was obtained from uncontrolled observational
> collection. In short, every POSSIBLE mistake that could be made
> about multiple regression HAD BEEN made by Richard Ulrich
> (which can easily be found in the Google archives).
June 14, Jerry Dallal on the Banker's example of knowing the
sign of the coefficient -- Here is the URL.
http://groups.google.com/group/sci.stat.math/msg/58fcfbb9f1788c19
The particular post is a good summary by Jerry. The thread also
features a Box reference, which has been cited several times
as positive support for what social scientists do, contrary to
Bob's use of it.
I don't know that Bob was (metaphorically) limping off the field,
but the game surely did not favor him, IMHO. So, Jerry makes
the same mistakes as I do, but does not post as often....
The earlier "discussion" between Bob and me was something I
have called parallel monologs. My take on it was that Bob
pretty regularly refused to answer any questions or discuss any
point I raised, while mainly repeating himself. He finally did come
up with a Tukey reference -- like the one from Box, everyone
has liked it, and no one else ever agreed with Bob's reading of it.
>
> Richard Ulrich didn't even know that the "normality assumption"
> does NOT apply to the PREDICTOR or independent variables in a
> multiple regression -- even after I had explained to the OP who made
> the same mistakes.
Here, Bob is prevaricating or distorting. I never claimed that.
Elsewhere, I have been cognizant of the fact that, in the social
sciences, we do often have variables that are not properly
scaled for our *purposes* (which purposes are ones that Bob
considers, probably, invalid). So, scaling is not irrelevant, but
*that* is not about any normality assumption.
My guess? Bob's experience with business data leads him astray,
since, there, scaling (I think) is using a given. Also, there, the
autocorrelation or other high intercorrelations do make "residuals"
more interesting than graphs of raw data.
In the case in point, I *did* recommend that the user, who failed
to find an expected correlation (this did not trigger Bob, this time),
could look at his predictor-versus-outcome plot to see if anything
was weird -- that could include outliers. Bob was recommending
extensive analyses and diagnostics.
Bob himself has posted on a similar point, I noticed today.
==== April 7, Reef Fish Bob, on looking at pictures, not diagnostics.
http://groups.google.com/group/sci.stat.math/msg/66de436e84fde31d?hl=en&
[snip, much] ...
[Bob]
The cliche "a picture is worth a thousand words" can be aptly
extended to
"a picture is worth a thousand summary statistics" provided the
picture-looker knows what and how to look!
That is more or less the foundation and guiding principle of the
Statistical Graphics Section of the ASA, and those think deeper
in applying statistics than routinely using methods, whether the
non-graphical, analytic methods were invented/discovered by others
or even themselves.
That's my take. YMMV.
====== end of Bob's post
I'm now puzzled at how my recent advice, on that question,
differs substantially from Bob's own earlier comment.
[ snip, rest]
> June 14, Jerry Dallal on the Banker's example of knowing the
> sign of the coefficient -- Here is the URL.
>
> http://groups.google.com/group/sci.stat.math/msg/58fcfbb9f1788c19
>
> The particular post is a good summary by Jerry. The thread also
> features a Box reference, which has been cited several times
> as positive support for what social scientists do, contrary to
> Bob's use of it.
>
> I don't know that Bob was (metaphorically) limping off the field,
> but the game surely did not favor him, IMHO. So, Jerry makes
> the same mistakes as I do, but does not post as often....
I'm not sure how I got dragged into this. If you read the cited thread
carefully, you'll see that the essential point of Bob's and my apparent
disagreement was
-------------------------------------------------------
[I'm >>; Bob is >]
>>As for Roy Welch, you'll have to take that up with him. I was at his
>>short course and that's how he opened it. The point was to impress us
>>with the importance of diagnostics and how he got involved studying
>>them. The message I got was that a banker had rejected an analysis Roy
>>had performed because it did not meet the banker's expectation and when
>>Roy looked closely at the data he saw that the banker was correct.
> If you had expressed it that way, it would not have immediately raised
> the unmistakable red-flag. The above would have been quite
> reasonable,
> and may even be perfectly reasonable, had YOU not said,
> JD> they remarked that the model could not be right because the sign
> JD> of one of the predictors was different from what they expected.
> That is an UNMISTABLE indication that the banker had mistaken the sign
> to reflect the sign of the SIMPLE correlation.
-------------------------------------------------------
I've no disagreement with Bob about (a) the sign of the partial
correlation not necessarily reflecting the sign of a partial
correlation, much as I would suspect that Bob would have no disagreement
had I my original post said (b) that once a particular partial
correlation is observed in a number of similar settings, one might
carefully investigate a new situation where that same partial
correlation was very different from anything observed previously.
When I wrote my first post, Bob read my comment as (a) when (b) was
intended.
I do not consider myself perfect. It is possible that there are errors
among the many posts I've made over the years, but I can't recall any
where I haven't self corrected before anyone else noticed it or where I
didn't accept the correction when offered. So, I disagree strenuously
that we make the same types of errors.
I do not have the time that you and Bob are able to devote to sci.stat
If I did, I might have corrected the errors in the December 11
"Interpretation of ANOVA when the entire population is under study
thread." On the one hand, it's a shame that the original poster left so
misinformed. OTOH, I've got a bunch of projects due and I'd hoped that
someone else would have straightened things out.
Had my name not been dragged into it, I would not have been responding
to *this* post. If you feel that I have posted errors to sci.stat.*,
please cite them and I (or Bob!) will be happy to correct them.
Bob and I do not see eye-to-eye on everything, but I agree with David
Heiser's assessment of Bob's statistical expertise.
Unfortunately, by being silent when you blundered gave you that
mistaken notion.
> > Ulrich had made blunders about what is LINEAR regression which
> > had been thoroughly discussed and explained by me in sci.stat.math
> > about the statistical and interpretive meanings of multiple regression
> > coefficients; about the SIGNs of regression coefficients; and about
> > making causal inference on correlation and regression results in
> > which the data was obtained from uncontrolled observational
> > collection. In short, every POSSIBLE mistake that could be made
> > about multiple regression HAD BEEN made by Richard Ulrich
> > (which can easily be found in the Google archives).
> I don't know that Bob was (metaphorically) limping off the field,
> but the game surely did not favor him, IMHO. So, Jerry makes
> the same mistakes as I do, but does not post as often....
I don't think so. In the examples I cited earlier, Jerry had 4 LINEAR
in his post while Ulrich had NONLINEAR in his post. None of
the others made the same mistakes you Richard Ulrich did.
> > Richard Ulrich didn't even know that the "normality assumption"
> > does NOT apply to the PREDICTOR or independent variables in a
> > multiple regression -- even after I had explained to the OP who made
> > the same mistakes.
>
> Here, Bob is prevaricating or distorting. I never claimed that.
Why don't you let the readers read what you wrote, and what I had
already pointed out, for THEMSELVES? If you said I distorted,
QUOTE yourself and HOW I distorted what you posted.
This is the kind of nonsense David Heiser was calling "kindergarten
garbage". That's all that is on what you said here and we had
gone over time and again.
> [ snip, rest]
>
> --
> Rich Ulrich, wpi...@pitt.edu
> http://www.pitt.edu/~wpilib/index.html
Give it a rest, Richard. If you were right, others would have seen
it. If what I said were your blunders and they were not, others
would have seen it too.
You are descending from posting "kindergarten garbage" to
"pre-kindergarten garbage", IMNSHO.
-- Bob.
On Tue, 27 Dec 2005 09:01:31 -0500, Jerry Dallal
<gdallal@SPAM_BLOCK.world.std.com> wrote:
> Richard Ulrich wrote:
>
> > June 14, Jerry Dallal on the Banker's example of knowing the
[snip, detail]
> -------------------------------------------------------
>
> I've no disagreement with Bob about (a) the sign of the partial
> correlation not necessarily reflecting the sign of a partial
> correlation, much as I would suspect that Bob would have no disagreement
> had I my original post said (b) that once a particular partial
> correlation is observed in a number of similar settings, one might
> carefully investigate a new situation where that same partial
> correlation was very different from anything observed previously.
>
> When I wrote my first post, Bob read my comment as (a) when (b) was
> intended.
Does Bob accept (b)? Or doesn't he consider it "error"?
You have expressed it gently, but that is my position,
which Bob rejects when it comes from me. I did not see him
accept it from you; that would have been notable.
>
> I do not consider myself perfect. It is possible that there are errors
> among the many posts I've made over the years, but I can't recall any
> where I haven't self corrected before anyone else noticed it or where I
> didn't accept the correction when offered. So, I disagree strenuously
> that we make the same types of errors.
Ditto, I've made mistakes that I've corrected or accepted.
So far as I can know, my "error" -- according to Bob -- is (b) with
no substantial difference. I want to know what Bob sees as
different. (Have you seen me make the error Bob claims?)
I did offer as ideal the example of epidemiology, where the
partial correlations have strong prior expectations.
>
> I do not have the time that you and Bob are able to devote to sci.stat
> If I did, I might have corrected the errors in the December 11
> "Interpretation of ANOVA when the entire population is under study
> thread." On the one hand, it's a shame that the original poster left so
> misinformed. OTOH, I've got a bunch of projects due and I'd hoped that
> someone else would have straightened things out.
>
> Had my name not been dragged into it, I would not have been responding
> to *this* post. If you feel that I have posted errors to sci.stat.*,
> please cite them and I (or Bob!) will be happy to correct them.
No, I wonderfully pleased by your posts. Except, perhaps,
that you let Bob wave off your differences at the end.
>
> Bob and I do not see eye-to-eye on everything, but I agree with David
> Heiser's assessment of Bob's statistical expertise.
I agree with you both, that Bob has some fine statistical expertise.
He has produced some wonderful statements on regression,
on time series, on clustering.
He also has a hostile attitude towards social scientists, which
is presently focused on me.
Yet RF is not familiar with the concept of fuzzy logic, where a same
sentence can be true and false at the same time, or actually - not true
nor false. Does this makes sense? I guess it does for RU, no for RF.
Now let's see what RF is going to say about me. I am looking forward to
read his rebuttal and bitter criticism with great interest...
Reef Fish (Prof. R.F. Ling in the real world) has retired, so we can
excuse him for not working any more.
And yes, polynomial regression is
> a non linear regression, viewed as a regression on X. And yes,
> polynomial regression is a linear regression, viewed as as multiple
> regression on the powers of X. Both viewpoints are correct. RU is
> right, so is RF.
>
We went through this earlier this year. The accepted definition is that
the regression is linear in the parameters, so a polynomial regression
is a linear regression. What originally cause problems was Reef Fish
claiming that y= b^2 X was a linear regression.
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
Reef Fish (Prof. R.F. Ling in the real world) has retired, so we can
excuse him for not working any more.
And yes, polynomial regression is
> a non linear regression, viewed as a regression on X. And yes,
> polynomial regression is a linear regression, viewed as as multiple
> regression on the powers of X. Both viewpoints are correct. RU is
> right, so is RF.
>
Unless some very nonstandard notation is implied, polynomial regression
is NO kind of regression--linear or nonlinear--on X.
If the standard notation is intended, then while polynomial regression
is a nonlinear FUNCTION of X, it is a linear regression on the response,
no matter how one slices it. I would be amazed to see an authoritative
reference to the contrary.
>Yet RF is not familiar with the concept of fuzzy logic, where a same
>sentence can be true and false at the same time, or actually - not true
>nor false. Does this makes sense? I guess it does for RU, no for RF.
This is NOT a question of fuzzy logic; at best, fuzzy logic
needs to be extended to probability to be used as a guide
for taking any action, including writing papers.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
vinc...@datashaping.com wrote:
> Now let's see what RF is going to say about me. I am looking forward to
> read his rebuttal and bitter criticism with great interest...
Actually, you were already criticized, rightly so, by Bob O'Hara, Jerry
Dallal, and Herman Rubin. I'll simply add a few FACTUAL comments
so as not to disappoint your "great interest", except the "bitter"
part.
Your ignorance on the subject showed in these respects:
1. Google found you posting only in 4 threads this year, 3 of them
since Dec 28, 2005, in sci.stat.math where a great deal of the
FACTUAL matters about "linear models" had been discussed.
2. A groups.google advanced search with keywords "linear models"
and author "reef fish" would have found my posts in 27 threads
in 2005, with dozens of posts in some of them, on the subject.
Had you read my posts, you would have seen all the FACTS in
my numbered references below.
> Just out of curiosity, when does RF has a chance to actually work on
> statistical projects?
Bob O'Hara's gave a partially correct answer. I've retired since
1999, [1] and had given away all my books and journals in 2002
[2]. I had analyzed real data in thousands of projects from my
students projects alone in my Data Analysis courses. [3].
I participated in nearly as many posts in newsgroups during
the years I had NOT retired (under the same "Reef Fish"
authorship, 1992-1999), [4]. I even gave rough estimations
that I may have 100,000 posts. [5], and the totality of time
I spent in those 100,000 posts was less than the time I
spent in any ONE of my papers published in JASA [6].
> He seems to be spending most of his time discussing RU posts.
That is true only to the extent that I RESPOND to most of the
posts that are addressed to ME. [7] In that respect, since
Richard Ulrich made most of the noises [8], that was true of
2 of the three threads you read [9]. But even so. my follow-up
to RU was less than 1/20 of all my posts [10], since RU did not
know enough on most of the subjects I posted even to make
noise in them. [11] Of those that he did, David Heiser found
RU to be 100% wrong [12] in RU's disagreements because
DH> Reefish's messages has forced me to do a lot of rethinking,
DH> and every time I find him correct
> Unless RF really works on statistical projects,
> he's no more a statistician than RU.
I ignore your insult because of your ignorance in (1) and (2) above.
> And yes, polynomial regression is
> a non linear regression, viewed as a regression on X. And yes,
> polynomial regression is a linear regression, viewed as as multiple
> regression on the powers of X. Both viewpoints are correct. RU is
> right, so is RF.
See comments by Jerry Dallal [13]. Even Bob O'Hara [14] did not
err as badly as he did during those threads.
BO> What originally cause problems was Reef Fish
BO> claiming that y= b^2 X was a linear regression.
Wrong! See: http://tinyurl.com/9k9ng [15]
The original discussion was prompted by an example in Kendall
and Stuart. The cited example was MINE, given in a quiz, as
example (7). Both were thoroughly explained in [15].
y = b0 + b1 X is a linear regression (even RU knows this)
(6) y = b X is a special case with b0 constrained to be 0, and
b1 = b. Even Bob O'Hare knew this. RU holds
the world record for being the only one who
claims
that to be a NONLINEAR model.
(7) y = b^2 X is a special case with b0 constrained to be 0, and
b1 constrained to be b^2, hence b1>0.
Of course they are all LINEAR models, For (7), just use the
same 0-intercept program for (6).
If the estimated b in (6) is positive, then it's the square of the b in
(7).
If the estimated b in (6) is negative, then take your choice:
(a) the constraint would force b^2 to be zero.
(b) you wonder why (a) when the data fit a straight line
y = - X perfectly.
(c) In view of (a) and (b), you question the sanity of the person
who puts the b^2 constraint without any CONTEXT or
JUSTIFICATION (same as Kendall and Stuart did, in their
pedantic example, without ANY context or justification).
But none of the above alters the FACT that by merely re-labelling
the unknown parameters, all of the examples in [15] are LINEAR
models, only some of them have constraints, some justifiably so,
because of the units of measurement, such as b^3 for measurements
in cubic units, and some not justified, as pulled from thin air.
> Yet RF is not familiar with the concept of fuzzy logic, where a same
> sentence can be true and false at the same time, or actually - not true
> nor false. Does this makes sense? I guess it does for RU, no for RF.
I am quite familiar with the subjects of "fuzzy sets" and "fuzzy logic"
as they have been subjects presented in talks in the National
(CSNA) and International (IFCS) Classification societies I served
as Program Chairman [16].
See Herman Rubin's comment [17]. The only thing "fuzzy" is your
THINKING, Vince -- same as RU's confusion of the crystal CLEAR
concepts of "linear" in "linear model" and "linear function". [18] in
the post Vince included below.
> Now let's see what RF is going to say about me. I am looking forward to
> read his rebuttal and bitter criticism with great interest...
The foregoing reply is my standard, pedestrial, FACTUA reply/rebuttal,
carefully substantiated by references to posts (many) you can find in
sci.stat.math.
-- Reef Fish Bob.
> If the estimated b in (6) is positive, then it's the square of the b in
> (7).
> If the estimated b in (6) is negative, then take your choice:
>
> (a) the constraint would force b^2 to be zero.
And hence the model is not linear: although it is piecewise linear. Or
is this some antiquated defnition of "linear" that I'm not aware of (and
neither, apparently were Kendall, Stuart, Draper or Smith).
Well let's say I work for a client and I tell him I' going to perform a
polynomial regression. Both the client and myself agree on the concept.
I develop some tools and help the client save millions of dollars,
thanks to my "polynomial regression". Do you think the client would be
concerned if every self-proclaimed (or even true) statisticians believe
that I'm dead wrong? Not the least bit - for the client my model is
working very well and that's all that matters.
Actually, assuming most of the people who call themselves statisticians
believe a plynomial regression is not a regression (and other similar
oddities), then I would stop calling myself a statistician for fear of
looking ridiculous in the eyes of potential clients.
My bad.
I wrote, "If the standard notation is intended, then while polynomial
regression is a nonlinear FUNCTION of X, it is a linear regression on
the response," which makes no sense.
It is not a linear regression on X (alone). Hence, the term polynomial
regression.
Sorry for the confusion.
--Jerry
Scratch some of that. I plead guilty to being too old to multi-task.
That will be my New Year's resolution--not to multi-task.
Let's sort this out carefully.
The statement that I originally took exception to was
>>>> And yes, polynomial regression is
>>>> a non linear regression, viewed as a regression on X.
Polynomial regression IS linear regression, even according to Kendall &
Stuart (really. They describe it as a special case of the general
linear regression model in their notorious section "The meaning of
'linear'".) It is a nonlinear function of X.
There is no issue about whether polynomial regression is regression.
The name itself answers that question. Many different terms are used to
describe polynomial regression. There's the general "polynomial
regression". Then there are terms like "second order model", "quadratic
regression".
In general, one talks about the regression of the response on the set of
predictors. "Linear regression of (univariate) Y on (univariate) X is
understood to mean something of the form Y = b0 + b1 X".
In summary, the statement to which I took exception was in fact
incorrect. However, my original response included a portion that
doesn't parse.
Did you read this? : http://tinyurl.com/9k9ng
Everything, including K&S definition was covered there.
> Then either tell me how a line
> y=b^2 can be linear,
Is a horizontal line not LINEAR in X?
Is Y = c X not a LINEAR model if c is estimated to be zero?
Bob O'Hara> or admit that you're wrong. (7) is clearly NOT
Bob O'Hara> linear in b.
That's the worst JOKE of the day.
You are hopeless, Bob O'Hara!
Just call b^2 an unknown constant, and label it ANYTHING you want.
k, theta, c, whatever. Then READ what I've re-explained that case,
just for someone as OBTUSE as Bob O'Hara.
> > If the estimated b in (6) is positive, then it's the square of the b in
> > (7).
> > If the estimated b in (6) is negative, then take your choice:
> >
> > (a) the constraint would force b^2 to be zero.
You saw it! And just made ANOTHER blunder of yours.
>
> And hence the model is not linear: although it is piecewise linear.
Are you SERIOUS? The estimated model Yhat = 0 is
"piecewise linear"?
That's the SOLUTION of the linear model Y = b0 + b1 X,
with the stated constrained in (7) if the OLS solution yielded b1 < 0
without the b1> 0 constraint.
Or
> is this some antiquated defnition of "linear" that I'm not aware of (and
> neither, apparently were Kendall, Stuart, Draper or Smith).
> --
> Bob O'Hara
> Department of Mathematics and Statistics
> P.O. Box 68 (Gustaf Hällströmin katu 2b)
> FIN-00014 University of Helsinki
> Finland
No, that's the same mental block Bob O'Hara had from day 1, even
though everyone else finally understood it (except you and RU of
course).
Here is your LAST CHANCE,
Y = b^3 X
This is the same example except changing the exponent from 2 to 3.
The solution is exactly the same as fitting Y = b1 X,
noting whatever the estimated value of b1, positive or negative,
b is the cube-root of it.
Is that NOT a linear model under everyone's definition,
including Kendall & Stuart's?
-- Reef Fish Bob.
I understood that perfectly and even recommended Vince to read it.
The essential facts are:
A polynomial regression "is a nonlinear FUNCTION of X"
but it is a "Linear regression MODEL" in the conventional and
universal definition of a "linear model" in statistics.
> My bad.
>
> I wrote, "If the standard notation is intended, then while polynomial
> regression is a nonlinear FUNCTION of X, it is a linear regression on
> the response," which makes no sense.
You simply phrased it ambiquously by introducing the unnecessary
word "response".
A LINEAR model is Y (response) - X (matrix) . beta (vector).
> It is not a linear regression on X (alone). Hence, the term polynomial
> regression.
>
> Sorry for the confusion.
Sorry to see you confusing YOURSELF. All of these are correct
statements below:
1. A polynomial is a nonlinear function of X (as you correctly
stated)
2. Fitting a polynomial to data is an example of a fitting a LINEAR
model,
in the definition of a "linear model" in regression analysis.
3, A "polynomial regression" as it is sometimes called by authors of
textbooks is UNDERSTOOD to mean BOTH (1) and (2), that it
is BOTH a linear model (in the parameters) and a nonlinear
function in the space of X.
Perhaps you should read David Heiser's explanation on why a
polynomial regression is a linear model again.
-- Reef Fish Bob.
f(Y) = SUM_k { a_k * sin(k*g(X)) + b_k * cos(k*g(X)) }
> Sorry to see you confusing YOURSELF.
Thanks. Me, too. It's been one of those days. The one saving grace is
that by the time Google catches up you'll see I self-corrected
immediately after posting.
--Jerry
> Here is your LAST CHANCE,
>
> Y = b^3 X
>
> This is the same example except changing the exponent from 2 to 3.
>
> The solution is exactly the same as fitting Y = b1 X,
>
> noting whatever the estimated value of b1, positive or negative,
> b is the cube-root of it.
While I sometimes dislike Bob's style, I have to agree with him on this.
Obviously,
y = a*x
is a linear equation (as any 7th grade student should be able to tell you).
Replacing the constant, 'a', (or parameter 'a' in the case of regression)
with
b^2 [where b=sqrt(a)] yields:
y = b^2 * x
That is still a linear equation of the two variables x and y regardless of
the
form of the constant coefficient.
More succinctly, if y = 4*x is linear in x and y, then so is y = 2^2 * x
I think there's a subtlety.
Forget regression. Is Y = b0 + b1 X^2 a linear *function*? I think the
answer is both yes and no. It is a linear function of X^2, but not a
linear function of X.
Back to regression. Is Y = b^2 X a linear regression of Y on X? It is
a linear regression whose parameter is b^2. However, it is not linear
*in b*.
Or, at least, that's where I am at the moment.
Reader - do keep in mind that I respond to only a fraction
(less than 1) of the posts where Bob mentions me. Or dwells
on me. And I ignore much of his "noise" within a post.
I have initiated "noise" very seldom. However, I respond
sometimes where Bob attacks others, in general or by name.
Perhaps Bob did not notice that is what happens.
[snip]
> > >
> > > y = b0 + b1 X is a linear regression (even RU knows this)
> > > (6) y = b X is a special case with b0 constrained to be 0, and
> > > b1 = b. Even Bob O'Hare knew this. RU holds
> > > the world record for being the only one who
> > > claims
> > > that to be a NONLINEAR model.
Bob still fails to "get it." I was not competing.
I'm not stuck on that adjective, NONLINEAR, or any adjective.
I did say before, I think: I was looking for adjectives, and
broader descriptions of consequences. How does one
definition affect, say, computer-programmed solutions?
How else is the definition reflected in theory?
I was trying, at that point in that discussion, to steer the
discussion to a more academic plane, of who uses what,
and why. That might have been before Bob developed the
abstract question to be, "how far can the envelop be pushed?"
I have no objection to pushing the envelop. I think there was
a second textbook mentioned by an occasional poster -- I will
say, "It is okay to go beyond texts, and that one, too."
But I think it is *rational* to discuss any definition in terms of
trade-offs, where you admit what generality you lose by accepting
a different generalization. If engineers do use "linear regression"
to exclude polynomials -- which is what an engineer told us,
a few years ago -- it is not for reasons that matter to me. But
their usage (if that is it) should be noted. Perhaps, just as
Bob did, in that recent post.
(Now, Bob can probably say all this better than I can, if he puts
his mind to it. He speaks statistics to statisticians, whereas I
speak statistics to everyone else. Bob's publications are
theoretical, whereas mine are data analyses.)
The last time that I tried to give this same point, "I was never
in your competition", Bob replied with a good mini-lecture on
polynomials. SO, my previous post wasn't totally a waste.
[snip, down to a convenient example at the end]
> No, that's the same mental block Bob O'Hara had from day 1, even
> though everyone else finally understood it (except you and RU of
> course).
>
> Here is your LAST CHANCE,
>
> Y = b^3 X
>
> This is the same example except changing the exponent from 2 to 3.
>
> The solution is exactly the same as fitting Y = b1 X,
>
> noting whatever the estimated value of b1, positive or negative,
> b is the cube-root of it.
>
> Is that NOT a linear model under everyone's definition,
> including Kendall & Stuart's?
Applying what I intended --
I'm happy to concede that it can be pragmatically *solved*
as a linear model -- is that enough? -- with an extra step needed
to get the error of b instead of b^3 . However, this complication
with the error is a material distinction, which is probably the
sign of further distinctions that deserve to be made, whatever the
definition of K&S. I'd be interested in hearing the other
distinctions.
The subtle trap you dug yourself into, Jerry!
>
> Forget regression. Is Y = b0 + b1 X^2 a linear *function*?
Of course it is NOT a linear function of X. When you think of a
FUNCTION, think of the PLOT of the line or surface at whatever
dimension of the VARIABLE.
What you get when you plot Y vs X in that example? You get
a parabola with Y intercept at b0.
Now you can LINEARIZE the function by making a transformation
of W = X^2. Then the equation becomes Y = b0+ b1 W, which is
a linear function of W -- a straight line when Y is plotted vs W.
It's that simple.
> I think the
> answer is both yes and no. It is a linear function of X^2, but not a
> linear function of X.
You said it very badly, as you did the other example. Say it slowly,
"It is a linear function of W, if you transform X^2 to W. It is a
nonlinear function of X."
> Back to regression. Is Y = b^2 X a linear regression of Y on X? It is
> a linear regression whose parameter is b^2. However, it is not linear
> *in b*.
>
> Or, at least, that's where I am at the moment.
Drop the word "regression" (which is implied in the context of a linear
MODEL in regression.
Then according to the definition we had seen a thousand times, it
is a LINEAR MODEL if it can be expressed as a
----> linear COMBINATION of the parameters!
The parameter themselves don't have to linear at all!
That's why Y = b0 + b1 X1 + b2 exp (X2)
and Y = b0 + b1^3 X1 + log(b2) exp(X2)
and Y = fo(b0) + f1(b1)X1 + f2(b2) X2
(provided the functions fi are invertible, such as odd powers
of bi; for even powers it would impky a constraint on the bs.)
are all linear COMBINATION of the bs, where teh Xis or
functions gi(Xi) play the roll of the coefficients of the
linear combination.
Here's a simple, well-known NONLINEAR model (in the
parameters, remember) that can be LINEARIZED -- you'll
find this in just about any textbook such as Draper and
Smith, and the rest.
Y = bo exp(-b1 X)
It is NOT a linear model because the parameters are in
product form that is NOT alinear combination.
This can be linearlized by a log transformation of both sides.
ln(Y) = ln(bo) = b1(X)
and voila, it becomes a SIMPLE regression of Y* = ln(Y)
on X* = X, in Y* = ao + a1 X*
If you do the above simple regression and obtain est for ao
and a1, then bo would be exp(ao) and b1 is -a1.
Of course ln(b0) is NOT linear in b0, but who cares except
those who are CONFUSED. Unconfuse yourself by
taking ANY textbook example of a nonlinear model that can
be "linearized", and you'll see that the parameters can be
any nonlinear functions of the original form.
If anyone in a business school tells you of a Cobb Douglas
production function, just tell him that it's nothing but a
LINEAR model of the simple regression kind.
Don't confuse the meaning of a linear FUNCTION or nonlinear
FUNCTION of an independent variable (or variables) with the
notion of what constitutes a "linear combination" of parameters
(or functions of parameters).
There are NO ambiguity whatsoever in these concepts.
Hope the Cobb-Douglas type of production function example
above unlocks some of the mental blocks by showing that
the PARAMETERS in a linear model (after the original
model had be "linearized" -- standard terminology can be nonlinear
functions of those parameters, as long as they are separable
into a linear combination of them to make the MODEL fit
the "linear combination" requirement of a linear model.
-- Reef Fish Bob,
If I can be bothered, I'll check Kendall & Stuart to see exactly what
they say: I'm fairly certain they did explicitly state that a linear
regression is linear in the parameters.
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
> > Obviously,
> >
> > y = a*x
> >
> > is a linear equation (as any 7th grade student should be able to tell
> > you).
> >
> > Replacing the constant, 'a', (or parameter 'a' in the case of
> > regression)
> > with b^2 [where b=sqrt(a)] yields:
> >
> > y = b^2 * x
> >
> > That is still a linear equation of the two variables x and y regardless
> > of the form of the constant coefficient.
> >
> > More succinctly,
> > if y = 4*x is linear in x and y, then so is y = 2^2*x
> >
>
> I think there's a subtlety.
>
> Forget regression. Is Y = b0 + b1 X^2 a linear *function*? I think the
> answer is both yes and no. It is a linear function of X^2, but not a
> linear function of X.
Well, sure, you can always "linearize" a function by replacing a nonlinear
component with a transformed surrogate. So, using your argument, you would
say:
y = a * sin(x^3+log(x))
is a linear function:
y = a * z
where z = sin(x^3+log(x))
But y is NOT a linear function of x. (Pity the student who identifies
y=a*sin(x^3+log(x)) as a linear function.)
> Back to regression. Is Y = b^2 X a linear regression of Y on X? It is
> a linear regression whose parameter is b^2. However, it is not linear
> *in b*.
I think I understand the root of the confusion in this discussion: It is a
problem of distinguishing between 'variables' and 'parameters'. You need to
consider parameters to be constants when judging the linearity of a
function.
When you perform a linear regression, you start with a prototype function
like:
y = b0 + b1*x
Where x is the independent variable, y is the dependent variable and b0 and
b1 are parameters. At the beginning of the regression, the values of b0 and
b1 are not known, so it is not visually obvious that b0 and b1 are not
variables. But the purpose of regression is to compute constant values for
b0 and b1. So when the regression finishes, we end up with constant values
for b0 and b1 and a function such as:
y = 2 + 4*x
which is obviously a linear function in x and y. As is
y = 2 + 2^2*x
Another way to test the question of whether a function of x and y is linear
is to plot it and see if it produces a straight line; that, of course, is
the reason it's called "linear" in the first place.
y = aX for a >= 0
y = 0 for a < 0
which is clearly not linear: the slope changes at a=0. It's a hockey
stick instead.
Bob
--
Bob O'Hara
Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org
Bob,
I don't disagree with what you're saying. I would hope that at the core
we are saying the same thing.
For example, I don't see a difference between
Me:" I think the answer is both yes and no. It is a linear function of
X^2, but not a linear function of X."
You: ""It is a linear function of W, if you transform X^2 to W. It is a
nonlinear function of X."
Happy New Year!
--Jerry
"Before proceeding further, it is well to emphasise the meaningof the
adjective "linear" in the general regression model (28.59): it is
_linear in the parameters beta_i, not necessarily in the x's. Up to
28.11, on the other hand, we understood by "linear regression" that the
conditional mean value of y is a linear function of the regressors
x_1,...,x_p. From the point of view of our present (Least Squares)
analysis, the latter (pehaps more "natural") definition of linearity is
irrelevant; it is linearity in the parameters that is essential."
(emphasis in the original, all typos added).
I would be intersted to see if anyone would disagree with this definition.
Bob
--
Bob O'Hara
Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org
> Dept. of Mathematics and Statistics
> P.O. Box 68 (Gustaf Hällströmin katu 2b)
> FIN-00014 University of Helsinki
> Finland
That definition is identical to the definition in every book in linear
models and multiple regression I've ever read or taught from. The
K&S definition had been discussed at length and everyone except
you still don't know that NO ONE would disagree with that
definition, and if they did, they would be wrong.
Wasn't it YOU who suggested to me to review the linear models
thread? Do you have a browse to read what's refered to in the
web page? The post below was referenced a dozen times in
recent discussion of linear models:
What was in there should have answered your answer then, and now:
pescaand...@hotmail.com (henceforth abbreviated as PE) provided
definitive answers to all three questions on my questions Q1-Q3
about certain SPECIFIC statements in Kendall and Stuart's book(s):
> Q1. What is Kendall and Stuart's DEFINITION of a Linear regression
> model? (all anyone had said was that he said certain models are
> NOT.) If it's different from my Y = X b definition, be precise on
> HOW Kendall and Stuart's definition if different.
On June 13,
PE> Q1: "y=X Beta + e, where Beta is a (kx1) vector of regression
PE> coefficients, X is an (nxk) matric of known coefficients, <...>
RF> Thank you! NOW we are finally getting SOMEWHERE. So, Kendall and
RF> Stuart's DEFINITION of a Linear model is exactly the SAME as what I
RF> said is the "universal" and "standard" definition after all, and as
RF> I repeatedly said I would have been very surprised or shocked if
RF> it wasn't.
Bob, if you're going to re-enter these discussion rehashing old stuff
that had been beaten to the ground, you would be well advised to
learn HOW to retrieve posts by keywords in the archives, and HOW
to read posts referenced by URL, and then read them!
-- Reef Fish Bob.
So, what about the model
y = b1 x1 + b1^2 x2
where b1 is the parameter? Do you agree that this is non-linear?
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
> Good. So clearly if we focus on the parameter b, then the model y=b^2x
> is not a linear model, but it is a linear model if we focus on b^2 (and,
> to be precise, we impose the limit b^2>=0).
In the context of regression, a "parameter" in a function is a _constant_
whose value will be determined by the analysis. It is not a variable. So
saying "focus on b^2" is like saying "focus on 4". Every place where a
parameter occurs, imagine replacing the parameter by an arbitrary constant,
and then look at the function.
> So, what about the model
>
> y = b1 x1 + b1^2 x2
>
> where b1 is the parameter? Do you agree that this is non-linear?
No. Once the regression has computed the values of the parameters, this
might end up being:
y = 3*x1 + 9*x2
Wouldn't you agree that that is a linear function that defines a plane?
Now, if the parameter is an exponent of a variable such as:
y = x^b
you have a nonlinear regression, because x is no longer a linear term when
you replace b by an arbitrary constant.
Here are some examples:
Linear: y = a + b*x
Linear: y = a + b^2*x
Linear: y = b0 + b1*x1 + b2*x2
Nonlinear: y = a*x^2
Nonlinear: y = a*x^b
Nonlinear: y = a*sin(x)
Nonlinear: y = b0 + b1*x + b2*x^2
Notice the pattern: in the linear functions you can replace any parameter by
any constant and the function remains linear. For the nonlinear functions,
if you replace parameters by arbitrary constants, the function is not
linear. Just keep repeating "A parameter is a constant, a parameter is a
constant..."
--
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com (decision tree and SVM predictive modeling)
http://www.nlreg.com (nonlinear regression)
To be honest, I can't see the relevance of this. A model is linear if
it's linear in the parameters. But the parameters only become fixed if
you fit a model to a specific set of data. If you fit the same model to
a different set of data, the parameters may be different.
And, like it or not, this is the way linear models are defined (it's not
just K&S: I checked other sources as well the first time this came up).
So, you're swimming against the tide.
> > In the context of regression, a "parameter" in a function is a
> > _constant_
> > whose value will be determined by the analysis. It is not a variable. So
> > saying "focus on b^2" is like saying "focus on 4". Every place where a
> > parameter occurs, imagine replacing the parameter by an arbitrary
> > constant,
> > and then look at the function.
> >
> Ah, you're not a Bayesian, are you. :-)
>
> To be honest, I can't see the relevance of this. A model is linear if
> it's linear in the parameters.
No. It is linear if it is linear in the variables, not the parameters.
> But the parameters only become fixed if
> you fit a model to a specific set of data. If you fit the same model to
> a different set of data, the parameters may be different.
That's quite true. But changing the values of constants does not affect
whether the function is linear or not. If one regression yields the
function
y = 2 + 3*x
and you run it with a new set of data and get a new fitted function
y = 5.3 + 7.1*x
It is still a linear function. Changing the values of the constants doesn't
affect the linear/nonlinear state. Consider the general function:
y = a + b*x
Do you think any values of a and b would change this from a linear function
to a nonlinear function? If you will agree that it remains linear, then you
should be able to reason that a regression performed with this functional
model and any set of data will result in a linear function.
In contrast, the function
y = a + b*x^2
is nonlinear in x and y regardless of the choice of a and b except for the
special case of b=0 where y isn't a function of x at all.
Hi Phil,
Thanks for stepping in and explain what had been explained a dozen
times to Anon Bob O''Hara, and he just never got it, and did not bother
to re-read what had been throughly answered.
Thanks for explaining the whole thing to him again. I don't think
he'll ever understand it. I know I've personally explained the SAME
thing to him a dozen times or more.
-- Reef Fish Bob.
> Thanks for stepping in and explain what had been explained a dozen
> times to Anon Bob O''Hara, and he just never got it, and did not bother
>
> to re-read what had been throughly answered.
>
> Thanks for explaining the whole thing to him again. I don't think
> he'll ever understand it. I know I've personally explained the SAME
> thing to him a dozen times or more.
I'm going to take another shot at explaining it later tonight after I fortify
myself with dinner.
I'm beginning to have more empathy for your frustration at explaining this.
--
Phil Sherrod
(phil.sherrod 'at' sandh.com)
http://www.dtreg.com (decision tree and SVM modeling)
http://www.nlreg.com (nonlinear regression)
I was able to decode their formula by assuming that it was a linear function of these factors. I took the statistics from five quarterbacks and fit an exact fit multiple regression.
R = a + Sum(Bi*Xi)
The coefficients turned out to nice and even. There was, however, a complicating issue. They set upper and lower limits on the contribution for each factor. The actual formula was
R = a + Sum(Bi*Zi),
where
Zi = Xi if Li < Xi < Ui
= Li if Xi <= Li
= Ui if Xi >= Ui
I was able to determine these limits by looking at the passing statistics from quarterbacks with only a few passing attempts.
So the QB rating formula is not only linear in the coefficients, but also linear in the Zi.
Jack
Bob
I know. You have been totally confused because you don't know how
to read, don't know how to retrieve old posts, and don't seem to know
how to read what I cited with tinurl.
> Reef Fish: are you agreing with Phil that a
> linear model is linear in the covariates, and not the parameters?
Phil was giving examples of linear and nonlinear FUNCTIONS of X,
in linear models (linear combination of the parameters).
Bob, seriously, have you EVER been trained in statistics? I know
you're in a dept of math and stat, and some of those who teach
statistics in such departments were trained in mathematics but
not in statistics.
You REALLY really need to go back to school and take a course
in regtression analysis. You have failed that subject ever since you
started in discussion threads back in June or earlier.
-- Reef Fish Bob.
To quote from an earlier post (my comment, followed by Phil's reply):
[quote]
>To be honest, I can't see the relevance of this. A model is linear if
> it's linear in the parameters.
No. It is linear if it is linear in the variables, not the parameters.
[/quote]
Do you agree with Phil or me on this? I thought you agreed with me
(plus or minus some details), but you now seem to be agreeing with Phil.
Bob
First of all, you don't even know what "_ad hominum_ attacks" are,
as in the logical fallacy of "Argumentum ad hominem" which means
you attack a person on issues UNRELATED to the issue being
discussed or debated.
In your case, I attacked what on what you POSTED on the issues
related to "linear models" and your repeated INABILITY to read
what had been explained to you (Anon Bob O'Hara) dozens of
times on the same ISSUE of "linear models".
> The problem is that you appear to have reversed your position.
My position and explanation had been exactly the same, throughout
the discussion of "linear models" in multiple regression, as I had
been TEACHING it, at the beginning graduate level, for 30 years.
You statement is simply your self-incrimination on what I said about
you on the ISSUE of "linear models" which you falsely labeled as
ad hominem.
>
> To quote from an earlier post (my comment, followed by Phil's reply):
>
> [quote]
> >To be honest, I can't see the relevance of this. A model is linear if
> > it's linear in the parameters.
That was your (Bob O'Hara's) correct statement after Phil's post
which had explained to YOU that the parameter b^2 didn't matter in
a linear model:
Anon Bob> So, what about the model
Anon Bob> y = b1 x1 + b1^2 x2
Anon Bob> where b1 is the parameter? Do you agree that this is
non-linear?
Phil> No. Once the regression has computed the values of the
parameters, this
Phil> might end up being:
Phil> y = 3*x1 + 9*x2
and then proceeded to point out the difference between a linear
FUNCTION of the Xs and a nonlinear FUNCTION of the X's
Phil> Linear: y = a + b*x
Phil> Linear: y = a + b^2*x
Phil> Linear: y = b0 + b1*x1 + b2*x2
Phil> Nonlinear: y = a*x^2
Phil> Nonlinear: y = a*x^b
Phil> Nonlinear: y = a*sin(x)
Phil> Nonlinear: y = b0 + b1*x + b2*x^2
Phil> Notice the pattern: in the linear functions you can replace any
Phil> parameter by any constant and the function remains linear.
So, clearly Phil understood what a "linear MODEL" in regression is,
and that a non-linear form of expression of the parameters of a
linear model does not change its linearity in the MODEL.
Granted, Phil did make a statement that is at best an ambiguous
statement the above concept he explained.
> No. It is linear if it is linear in the variables, not the parameters.
> [/quote]
That is one of his ambiguous statements. I took it that he was
addressing the linearity of a FUNCTION, such as a fitted
polynomial, which started this whole re-hash. A polynomial is
not a linear FUNCTION of X because it is not linear in the
variables.
>
Anon Bob> Do you agree with Phil or me on this? I thought you
Anon Bob> agreed with me (plus or minus some details), but
Anon Bob> you now seem to be agreeing with Phil.
My post was my reply to your question, in the context of what
had been explained by Phil, and what had been explained by
ME, in the past decades, and throughout the "linear model"
and "polynomial regression" threads in sci.stat.math.
I'll repeat verbatim what I said:
RF> Phil was giving examples of linear and nonlinear FUNCTIONS of X,
RF> in linear models (linear combination of the parameters).
RF> Bob, seriously, have you EVER been trained in statistics? I know
RF> you're in a dept of math and stat, and some of those who teach
RF> statistics in such departments were trained in mathematics but
RF> not in statistics.
Bob, why don't you answer my question about your training in
STATISTICS. If so, at what university and at what level?
RF> You REALLY really need to go back to school and take a course
RF> in regtression analysis. You have failed that subject ever since
you
RF> started in discussion threads back in June or earlier.
The above comment is really independent of what statistical training,
it was a statement of FACT about your failure in the issue of
understanding regression and linear model in STATISTICS, as
PROVEN in your dozens of posts in sci.stat.math.
Any competent statistician would have known that, whether they
said it or not, because at the end of countless posts on the
subject, summarized by me in the post in June, which I had said
to you
RF> The post below was referenced a dozen times in
RF> recent discussion of linear models:
RF> What was in there should have answered your answer then, and now:
As of June 14, you (Bob O'Hara) and Richard Ulrich were the ONLY
ones in the newsgroups who had participated in the "linear models"
discussion who are STILL in the fog about what a linear model is!
The latter was amply proven in your (and Richard Ulrich's) posts on
the subject of "polynomial regression) in December 2005, up to NOW.
-- Reef Fish Bob.
Not Argumentum ad Hominem at all.
-- Reef Fish Bob.
I would like to stick to the issue of what a linear model is, please.
>>To quote from an earlier post (my comment, followed by Phil's reply):
>>
>>[quote]
>> >To be honest, I can't see the relevance of this. A model is linear if
>> > it's linear in the parameters.
>
>
> That was your (Bob O'Hara's) correct statement after Phil's post
> which had explained to YOU that the parameter b^2 didn't matter in
> a linear model:
>
Good, thank you.
> Anon Bob> So, what about the model
> Anon Bob> y = b1 x1 + b1^2 x2
>
> Anon Bob> where b1 is the parameter? Do you agree that this is
> non-linear?
>
> Phil> No. Once the regression has computed the values of the
> parameters, this
> Phil> might end up being:
>
> Phil> y = 3*x1 + 9*x2
>
> and then proceeded to point out the difference between a linear
> FUNCTION of the Xs and a nonlinear FUNCTION of the X's
>
> Phil> Linear: y = a + b*x
> Phil> Linear: y = a + b^2*x
> Phil> Linear: y = b0 + b1*x1 + b2*x2
>
> Phil> Nonlinear: y = a*x^2
> Phil> Nonlinear: y = a*x^b
> Phil> Nonlinear: y = a*sin(x)
> Phil> Nonlinear: y = b0 + b1*x + b2*x^2
>
> Phil> Notice the pattern: in the linear functions you can replace any
> Phil> parameter by any constant and the function remains linear.
>
> So, clearly Phil understood what a "linear MODEL" in regression is,
> and that a non-linear form of expression of the parameters of a
> linear model does not change its linearity in the MODEL.
>
But surely a model such as y = a*sin(x) is a linear model! I would have
thought that if Phil understood what a linear model was (at least
according to the definition I gave above, that we agree on). This
should be taken up ith Phil, though, to clarify what he thinks.
Right, to get back to the question of what a linear model is, do you
think that this is a linear model:
y = a + b*x1 + b^2*x2?
It IS relevant to how you could have been so obtuse in understand
the basic issue after it had been explained to you a dozen times!
> I will therefore continue to ignore these irrelevancies.
>
> I would like to stick to the issue of what a linear model is, please.
Just REVIEW what had already been posted, re-posted, and re-re-
explained a dozen or more times!
>
> >>To quote from an earlier post (my comment, followed by Phil's reply):
> >>
> >>[quote]
> >> >To be honest, I can't see the relevance of this. A model is linear if
> >> > it's linear in the parameters.
> >
> >
> > That was your (Bob O'Hara's) correct statement after Phil's post
> > which had explained to YOU that the parameter b^2 didn't matter in
> > a linear model:
> >
> Good, thank you.
Actually I gave you credit for an AMBIGUOUS answer which be said
to be INCORRECT, as I had given Phil the same benefit of the doubt.
Your statement, if correctly stated, should have been:
"A model is linear if it's a "linear combination" of the parameters"
Your use of "linear" made it ambiguous at best, and wrong as stated.
"A regression model is linear model if it is a linear combination of
its
parameters -- which may be in linear form, or in a nonlinear
function of an arbitrary parametrization".
>
> > Anon Bob> So, what about the model
> > Anon Bob> y = b1 x1 + b1^2 x2
> >
> > Anon Bob> where b1 is the parameter? Do you agree that this is
> > non-linear?
> >
> > Phil> No.
That is correct!
It is NOT nonlinear because it had been explained a dozen times
that it is a LINEAR model with constraint in the parameters.
RF> The post below was referenced a dozen times in
RF> recent discussion of linear models:
Do you know how to retrieve ANY old post? You are just wasting
bandwidth and everyone's time because of your own lack of proper
statistical training and your inability to review concept that had been
repeatedly and thoroughly covered.
> > Phil> Linear: y = a + b*x
> > Phil> Linear: y = a + b^2*x
> > Phil> Linear: y = b0 + b1*x1 + b2*x2
> >
> > Phil> Nonlinear: y = a*x^2
> > Phil> Nonlinear: y = a*x^b
> > Phil> Nonlinear: y = a*sin(x)
> > Phil> Nonlinear: y = b0 + b1*x + b2*x^2
> >
> > Phil> Notice the pattern: in the linear functions you can replace any
> > Phil> parameter by any constant and the function remains linear.
> >
> > So, clearly Phil understood what a "linear MODEL" in regression is,
> > and that a non-linear form of expression of the parameters of a
> > linear model does not change its linearity in the MODEL.
> >
> But surely a model such as y = a*sin(x) is a linear model! I would have
> thought that if Phil understood what a linear model was (at least
> according to the definition I gave above, that we agree on). This
> should be taken up ith Phil, though, to clarify what he thinks.
That's what I took it as what Phil thinks because he had already
explained to YOU what a linear model in regression is! He was
explaining what a linear FUNCTIONAL model in X is, to you.
>
> Right, to get back to the question of what a linear model is, do you
> think that this is a linear model:
>
> y = a + b*x1 + b^2*x2?
Of course it is a LINEAR model, with constraints on one of the
parameters. If you didn't get it (or don't know how to get) in
the subject of linear model with constraints was thoroughly
discussed in several other topics in sci.stat.math. I did a google
look just now, with keywords "linear models with constraints",
and author "reef fish" and found 10 THREADS, the first of which
in google's hits was:
Date: 17 Jun 2005 08:04:41 -0700
Local: Fri, Jun 17 2005 10:04 am
Subject: Linear Regression models with CONSTRAINTS on the Parameters,
Part II
13 posts, Anon Bob O'Hara in it 4 times, apparently learning NOTHING.
The 2nd google hit was:
Date: 14 Jun 2005 20:25:25 -0700
Local: Tues, Jun 14 2005 10:25 pm
Subject: Re: Linear Regression models with CONSTRAINS on the
parameters, Part I
19 posts, Anon Bob O'Hara in it 6 times, apparently learning NOTHING,
The 3rd google hit was:
Date: 2 Jun 2005 05:56:34 -0700
Local: Thurs, Jun 2 2005 7:56 am
Subject: Re: What are LINEAR or LINEAR REGRESSION models?
231 messages, 24 authors, and Anon Bob O'Hara in it several times,
apparently learning NOTHING.
Bob, why don't you start with THOSE threads instead of asking
your stupid questions (for one who had supposedly been reading
and participating in those threads). There were 7 other threads
in which I explained the issue of "linear models with constraints"
which was why google found the 10 threads with "reef fish" as
author among the 34 threads with "linear models" with me as
author.
-- Reef Fish Bob.
From the 30th (i.e. yesterday), I gave a direct quoite from K&S:
"Before proceeding further, it is well to emphasise the meaningof the
adjective "linear" in the general regression model (28.59): it is
_linear in the parameters beta_i, not necessarily in the x's. Up to
28.11, on the other hand, we understood by "linear regression" that the
conditional mean value of y is a linear function of the regressors
x_1,...,x_p. From the point of view of our present (Least Squares)
analysis, the latter (pehaps more "natural") definition of linearity is
irrelevant; it is linearity in the parameters that is essential."
(emphasis in the original, all typos added)."
To which you replied:
"That definition is identical to the definition in every book in linear
models and multiple regression I've ever read or taught from. The
K&S definition had been discussed at length and everyone except
you still don't know that NO ONE would disagree with that
definition, and if they did, they would be wrong."
Note that K&S make no mention of a linear regression being a linear
combination, and you admit to never having seen a different definition.
So, you are now telling me that my "correct" definition would be
different to everybody else's.
<snip>
>>Right, to get back to the question of what a linear model is, do you
>>think that this is a linear model:
>>
>>y = a + b*x1 + b^2*x2?
>
>
> Of course it is a LINEAR model, with constraints on one of the
> parameters. If you didn't get it (or don't know how to get) in
>
We'll have to come back to this when we've sorted out what the actual
definition of a linear model is.
I'm going through this because your position seems to me to be
inconsistent, so I want to clarify it to make sure I haven't
mis-understood what your position is.
<snip>
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
>
> Actually I gave you credit for an AMBIGUOUS answer which be said
> to be INCORRECT, as I had given Phil the same benefit of the doubt.
>
> Your statement, if correctly stated, should have been:
>
> "A model is linear if it's a "linear combination" of the parameters"
>
> Your use of "linear" made it ambiguous at best, and wrong as stated.
>
> "A regression model is linear model if it is a linear combination of
> its
> parameters -- which may be in linear form, or in a nonlinear
> function of an arbitrary parametrization".
Bob, please clarify some terminology for me. Are the "parameters"
mentioned above the regression coefficients? If so, I find it odd to
describe a regression model as a linear combination of the regression
coefficients.
Here is a pretty typical defintion of "linear combination":
linear combination - A sum of values each multiplied by some coefficient
[or weight]. A linear combination can be expressed as the inner product
of two vectors, one representing the data and the other a vector of
coefficients.
(from http://life.bio.sunysb.edu/morph/glossary/gloss2.html)
I think it is pretty standard to talk about linear combinations of
*variables*, but not of the weights or coefficients.
So, as I understand it, a regression model is "linear in the parameters"
if Y-hat = a linear combination of *explanatory variables* (plus
constant), not of the parameters. The parameters (i.e., regression
coefficients) are the weights in the weighted sum of variables.
Cheers,
Bruce
--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
Let's cut to the essential points leading to the present dead-horse
you kept dragging out of the grave. First,
> >>
> >>Indeed. My statistical education and competence is irrelevant to the
> >>definition of a linear model.
> >
> > It IS relevant to how you could have been so obtuse in understand
> > the basic issue after it had been explained to you a dozen times!
Answer THAT question first. About your background in STATISTICAL
education.
> >>I would like to stick to the issue of what a linear model is, please.
You will be ignored UNTIL you have answered the above question,
and whether you have read the post in the tinyurl mentioned several
times the past few days; or any of the references (in my post
immediately preceding this) I pointed you to.
Not until then, you'll be justly ignored, as I should have, 20 posts
ago.
Do Your HOME WORK first, Anon Bob O'Hara.
You are just like some of my undergrad students who flunked the
first course in stat. because they missed classes or fell asleep for
the first 10 weeks of a semester, and then started asking questions
about how to use the normal tables when that had been thoroughly
covered and repeated done weeks and months before.
-- Reef Fish Bob.
You ignored ALL my pointers to replies that were already given to
your questions, and did not reply if you have read the post below:
Odd as it may seem to you (and that has been the main source of
confusion for many) that's exactly how a "linear model" is defined
by EVERYONE in statistics -- a linear combination of the
regression coefficients!
You weren't in those threads in sci.stat.math which I pointed
Bob O'Hara (earlier today) to re-read because he was there when
the explanations were made.
>
> Here is a pretty typical defintion of "linear combination":
>
> linear combination - A sum of values each multiplied by some coefficient
> [or weight]. A linear combination can be expressed as the inner product
> of two vectors, one representing the data and the other a vector of
> coefficients.
> (from http://life.bio.sunysb.edu/morph/glossary/gloss2.html)
Of course that's the definition of a "linear combination". No one
needs
that web to explain that simple notion.
The stumbling block for many is not knowing WHICH of the vectors
in the inner product plays the role of "coefficient" in the linear
combination, and which play the role of "variable", in two
DIFFERENT definitions!!! <---- that's the key to unlocking
the mystery to those who had not been exposed to the statistical
definitions of "linear MODEL" and "linear FUNCTION".
>
> I think it is pretty standard to talk about linear combinations of
> *variables*, but not of the weights or coefficients.
You CAN talk about a linear combination of the variables as
a FUNCTION of the variables.
But in the definition of a "linear model", it is a linear combination
of the regression coefficient (being the variables to be estimated)
while the variables (X's) play the role of the coefficients in that
linear combination!
It's an "unnatural" concept for the statistical package users, to be
sure. But so is the concept of INDEPDENDENCE in "linear
independence" and "stochastic independence" in the post I made
on why those words "linear" and "independence" are the most
often confused by those technically untrained in statistics.
> So, as I understand it, a regression model is "linear in the parameters"
> if Y-hat = a linear combination of *explanatory variables* (plus
> constant), not of the parameters. The parameters (i.e., regression
> coefficients) are the weights in the weighted sum of variables.
That's 100% incorrect about a "linear MODEL" in a regression.
That is, the regression coefficients AND the explanatory variables
both play a DUAL ROLL, in two different definitions.
The regression coefficients play the roll of VARIABLES (and the
X's coefficients) in the linear combination in a LINEAR MODEL
definition, which is universal.
The regressioni coefficients play the roll of COEFFICIENTS (and
the X's variables) in the linear combination of a FUNCTION of
the regressors, which may or may not be linear!
That is why a polynomial regression is a linear MODEL, but not
a linear FUNCTION of X.
Your misunderstanding of these concepts is typical of those
without any training in the theory of statistics and the definition
and usage of statistical terms in.
But as Walter Cronkite used to say at the conclusion of his news
broadcast, "That's the way it is".
-- Reef Fish Bob.
That is indeed how I interpreted your statement that "A model is linear
if it's a "linear combination" of the parameters". I just wanted to be
sure that you really meant parameters, and not variables. Thanks for
clarifying that you did.
>
> The regressioni coefficients play the roll of COEFFICIENTS (and
> the X's variables) in the linear combination of a FUNCTION of
> the regressors, which may or may not be linear!
>
> That is why a polynomial regression is a linear MODEL, but not
> a linear FUNCTION of X.
>
I have no problem with that. But as you say, even if the functional
relationship is not linear, you still have a linear combination of
coefficients (the regression coefficients) and variables (the X's).
>
> Your misunderstanding of these concepts is typical of those
> without any training in the theory of statistics and the definition
> and usage of statistical terms in.
One more question then, in hopes of furthering my education. ;-)
With respect to the "dual roles" you have described for regression
coefficients and variables (the X's), is it possible to have a linear
combination of *parameters* when one treats the X's as coefficients, but
NOT have a linear combination of *variables* when one treats the
regression coefficients as coefficients?
Putting it another way, can you ever have a linear model (i.e., linear
combination of parameters) for which Y-hat is NOT a linear combination
of the X's (allowing for X2 to be equal to X1-squared, etc)?
So, to try and get back to the subject, do you think that the model
y = b*x1 + b^2*x2
is a linear model?
First of all, a Happy New 2006 to all. I am going to put the "linear
models" discussion to rest, unless someone has a really new
question and not old horses that had been beaten to death in 2005.
Your question above sounds complicated and is likely lead to more
confusion if i answer you in those thems, so I'll answer you in the
rephrased form.
The answer had actually been given (but not beaten to death <G>)
if you just think about the linear MODELS that are not linear
FUNCTIONS (hence not a linear combination) of the variables X's.
> Putting it another way, can you ever have a linear model (i.e., linear
> combination of parameters) for which Y-hat is NOT a linear combination
> of the X's (allowing for X2 to be equal to X1-squared, etc)?
>
> Cheers,
> Bruce
Actually, any general quadratic (or higher-dimensional) surfaces in the
X-space is an example of that.
1. The are all LINEAR models that can easily be fit by LS in a
regression.
2. The cross-product terms necessary for the general form cannot be
written as a linear combination of the variables X's. E.g.,
Y = bo + b1 X1 + b2 X2 + b3 X1^2 + b4 X2^2 + b5X 1X2
is the 3-D quadratic surface in X1 and X2.
I posted what's below on Nov 10, in explaining to schwail how he should
proceed with his problem because of all the bad advice he got from RU:
RF> To explore even the quadratic surface fit of Y on X1, ..., X4, you
need
RF> to START with all of these terms as your independent variables:
RF> X1, X2, ...X4, and all squared terms of Xi, and all cross-product
RF> terms such as X1X2, X1X3, ... X3X4, because ALL of them may
RF> be needed for the quadratic surface your data may fit. More
likely
RF> than not, many of those terms are NOT needed in your fit, but you
RF> have to start with them before you know which you need and
RF> which you don't, instead of just blindly keeping the three linear
RF> terms in X1, X2, and X4.
You may already be familiar with the fact that an ANOVA, MANOVA,
procedures for comparing means can be set up as a linear (regression)
model in which all of the X's are Indicator (some call it Dummy)
variables (with values 0 or 1). The cross-product terms would be
the two-day interaction of the "treatments". There could be higher
interactions also. Of course NONE of these linear models can be
expressed as a linear combination of the main effects.
-- Reef Fish Bob.
Agreed.
>
> 2. The cross-product terms necessary for the general form cannot be
> written as a linear combination of the variables X's. E.g.,
>
> Y = bo + b1 X1 + b2 X2 + b3 X1^2 + b4 X2^2 + b5X 1X2
>
> is the 3-D quadratic surface in X1 and X2.
Okay, this is getting to the heart of my question. It boils down to
what is the definition of "linear combination". I have been thinking
of linear combination as a weighted sum of k variables:
linear combination = w0(v0) + w1(v1) + w2(v2) + w3(v3)...+wk(vk)
For the model you give above, I would have said:
w0 = b0, v0 = 1
w1 = b1, v1 = X1
w2 = b2, v2 = X2
w3 = b3, v3 = X1^2
w4 = b4, v4 = X2^2
w5 = b5, v5 = X1*X2
And I would have called that a "linear combination" of the v's.
(That's what I meant when I said, allowing for X2 to be equal to
X1-squared, etc).
But from what you've said (here, and in other posts), I need to revise
my notion of linear combination to disallow v's that are functions of
other v's. Is that right?
Linear combination -- a weighted sum of variables (v's) where:
1. No v is a function of any other v, BUT
2. weights can be functions of other weights.
Point 2 follows from your reversing the roles of b's and X's in an
earlier post. You say this is a linear model because there is a linear
combination of b's (with the X's as weights). It follows, therefore,
that it is permissible for the w's in a linear combination to be
functions of other w's.
> I posted what's below on Nov 10, in explaining to schwail how he should
> proceed with his problem because of all the bad advice he got from RU:
>
>
> RF> To explore even the quadratic surface fit of Y on X1, ..., X4, you
> need
> RF> to START with all of these terms as your independent variables:
>
> RF> X1, X2, ...X4, and all squared terms of Xi, and all cross-product
>
> RF> terms such as X1X2, X1X3, ... X3X4, because ALL of them may
> RF> be needed for the quadratic surface your data may fit. More
> likely
> RF> than not, many of those terms are NOT needed in your fit, but you
> RF> have to start with them before you know which you need and
> RF> which you don't, instead of just blindly keeping the three linear
> RF> terms in X1, X2, and X4.
No worries there. Aiken & West (1991) call that a hierarchical
step-down method, FWIW.
>
>
> You may already be familiar with the fact that an ANOVA, MANOVA,
> procedures for comparing means can be set up as a linear (regression)
> model in which all of the X's are Indicator (some call it Dummy)
> variables (with values 0 or 1). The cross-product terms would be
> the two-day interaction of the "treatments". There could be higher
> interactions also. Of course NONE of these linear models can be
> expressed as a linear combination of the main effects.
Yes, I am aware of that. The issue, as noted above, comes down to the
definition of "linear combination", and whether any of the v's can be
functions of other v's.
>
> -- Reef Fish Bob.
Bruce, you are REGRESSING! You cited some web page for a definition
of what is a linear combination and I said something to the effect that
everyone should know that -- so why need to cited a webpage?
A linear combintation is the dot product of two vectors (arrays if you
wish).
One of the arrays can play the role of the "variables" on one
definition,
and play the the roll of the "coefficient" in a different definition.
So what's your problem?
> I have been thinking
> of linear combination as a weighted sum of k variables:
>
> linear combination = w0(v0) + w1(v1) + w2(v2) + w3(v3)...+wk(vk)
Exactly!
>
> For the model you give above, I would have said:
>
> w0 = b0, v0 = 1
> w1 = b1, v1 = X1
> w2 = b2, v2 = X2
> w3 = b3, v3 = X1^2
> w4 = b4, v4 = X2^2
> w5 = b5, v5 = X1*X2
That is exactly what I meant in whatever I said.
>
> And I would have called that a "linear combination" of the v's.
> (That's what I meant when I said, allowing for X2 to be equal to
> X1-squared, etc).
>
> But from what you've said (here, and in other posts), I need to revise
> my notion of linear combination to disallow v's that are functions of
> other v's. Is that right?
There is nothing to revise. You are creating stumbling blocks for
yourself. Y = w0 + ... + w6,
> w0 = b0, v0
> w1 = b1, v1
> w2 = b2, v2
> w3 = b3, v3
> w4 = b4, v4
> w5 = b5, v5
which is a linear combination of the vs (as in a linear FUNCTION of
the vs), and a linear combination of the bs (as in a linear MODEL).
What difference does it make what the v's are, as long as the v's
are "linearly independent" in the linear models context.
> Linear combination -- a weighted sum of variables (v's) where:
>
> 1. No v is a function of any other v,
Says who? As a regression function Y on the independent variables
v's, we have seen uncountable numbers of examples that the v's can
be powers of themselves, nonlinear functions of themselves, and
an product of any combination of them. All of those are perfectly
good (and very common) examples of LINEAR MODELS (in regression)!
> BUT
> 2. weights can be functions of other weights.
In the case above, its a non-problem or no-brainer because the v's
(and x's) only play the role of coefficient in the DEFINITION of a
"linear combination". You are not solving or estimating those
weights in a regression, because they are given DATA.
In the other linear combination, if the b's are functions of other b's,
we simply have a linear model with CONSTRAINTS on the
estimated coefficients! In fact, this is a very good way to
distinguish the two cases of "linear model" and "linear function"
in the regression context.
> Point 2 follows from your reversing the roles of b's and X's in an
> earlier post. You say this is a linear model because there is a linear
> combination of b's (with the X's as weights). It follows, therefore,
> that it is permissible for the w's in a linear combination to be
> functions of other w's.
See my explanation in the two preceding paragraphs.
>
>
> > I posted what's below on Nov 10, in explaining to schwail how he should
> > proceed with his problem because of all the bad advice he got from RU:
> >
> >
> > RF> To explore even the quadratic surface fit of Y on X1, ..., X4, you
> > need
> > RF> to START with all of these terms as your independent variables:
> >
> > RF> X1, X2, ...X4, and all squared terms of Xi, and all cross-product
> >
> > RF> terms such as X1X2, X1X3, ... X3X4, because ALL of them may
> > RF> be needed for the quadratic surface your data may fit. More
> > likely
> > RF> than not, many of those terms are NOT needed in your fit, but you
> > RF> have to start with them before you know which you need and
> > RF> which you don't, instead of just blindly keeping the three linear
> > RF> terms in X1, X2, and X4.
>
> No worries there. Aiken & West (1991) call that a hierarchical
> step-down method, FWIW.
Which is irrelevant to the MODEL! That would be a way of ESTIMATING
the regression coefficients in a hierarchical fashion because of the
meaning attached to the quadratic surfaces of the FUNCTION.
No one needs Aiken or West for their term in the DEFINITION of
what constitues a generaly quadratic surface.
> >
> > You may already be familiar with the fact that an ANOVA, MANOVA,
> > procedures for comparing means can be set up as a linear (regression)
> > model in which all of the X's are Indicator (some call it Dummy)
> > variables (with values 0 or 1). The cross-product terms would be
> > the two-day interaction of the "treatments". There could be higher
> > interactions also. Of course NONE of these linear models can be
> > expressed as a linear combination of the main effects.
>
> Yes, I am aware of that. The issue, as noted above, comes down to the
> definition of "linear combination", and whether any of the v's can be
> functions of other v's.
You are WRONG in the respect that the definition of "linear
combination"
is a concept in linear algebra that is always the same, indepdent of
any
definition in regression that depends on the use of that term.
In a regression context, the v's can ALWAYS be functions of other V's.
That's pre-kindergarten stuff in the meaning of a regression FUNCTION.
When the regressioni coefficient b's are functions of other b's, there
you can no longer express the model as a linear model, e.g.,
If b3 = b1 * b2, then the model can no longer be written as a
linear combinations of b1, b2, and b3, and it becomes a non-
linear MODEL.
Bruce, you're making life difficult for yourself. Just keep it
simple!
The three definitions -- "linear combination", "linear model", and
"linear function" are distinct and separate. The example I gave as
to why it's perfectly okay for the X's to be products or functions of
otther X's (as regression DATA), while not okay for the REGRESSION
coefficients to be products of other coefficient is a good way to
sort out the distinct and separate meanings of those terms.
-- Reef Fish Bob.
Strike the above paragraph as INCORRECT (in general). It
MAY be considered as one with constraints in some special
circumstances. The correct (in general) explanation was given
below when it became apparent what Bruce had in mind about
some coefficients are of the form b1*b2 or others, in an
analogous way of some indept variables can be X1*X2.
As I said later,
> If b3 = b1 * b2, then the model can no longer be written as a
> linear combinations of b1, b2, and b3, and it becomes a non-
> linear MODEL.
The different roles the b's and X's play in a regression model is the
key to the distinction of what's allowed and what's not.
Bob, I understood you to be saying here that Y does NOT equal a linear
combination of the variables for the equation given above. Did I
misunderstand you? I ask, because further down, you seem to agree that
w0(v0) + w1(v1) + w2(v2) + w3(v3) + w4(v4) + w5(v5) IS a linear
combination of variables (where the v's are defined as shown below), and
that it doesn't matter what the v's are, as long as they are linearly
independent.
>>
>>
>>Okay, this is getting to the heart of my question. It boils down to
>>what is the definition of "linear combination".
>
>
> Bruce, you are REGRESSING! You cited some web page for a definition
> of what is a linear combination and I said something to the effect that
>
> everyone should know that -- so why need to cited a webpage?
>
> A linear combintation is the dot product of two vectors (arrays if you
> wish).
> One of the arrays can play the role of the "variables" on one
> definition,
> and play the the roll of the "coefficient" in a different definition.
>
> So what's your problem?
>
>
>
>>I have been thinking
>>of linear combination as a weighted sum of k variables:
>>
>> linear combination = w0(v0) + w1(v1) + w2(v2) + w3(v3)...+wk(vk)
>
>
> Exactly!
So far so good.
>
>>For the model you give above, I would have said:
>>
>> w0 = b0, v0 = 1
>> w1 = b1, v1 = X1
>> w2 = b2, v2 = X2
>> w3 = b3, v3 = X1^2
>> w4 = b4, v4 = X2^2
>> w5 = b5, v5 = X1*X2
>
>
> That is exactly what I meant in whatever I said.
Great.
>
>>And I would have called that a "linear combination" of the v's.
>>(That's what I meant when I said, allowing for X2 to be equal to
>>X1-squared, etc).
>>
>>But from what you've said (here, and in other posts), I need to revise
>>my notion of linear combination to disallow v's that are functions of
>>other v's. Is that right?
>
>
> There is nothing to revise. You are creating stumbling blocks for
> yourself. Y = w0 + ... + w6,
>
>
>> w0 = b0, v0
>> w1 = b1, v1
>> w2 = b2, v2
>> w3 = b3, v3
>> w4 = b4, v4
>> w5 = b5, v5
>
>
> which is a linear combination of the vs (as in a linear FUNCTION of
> the vs), and a linear combination of the bs (as in a linear MODEL).
>
> What difference does it make what the v's are, as long as the v's
> are "linearly independent" in the linear models context.
Fine. But then why did you say that Y-hat does not equal a linear
combination of predictors for this example?
Y = b0 + b1X1 + b2X2 + b3X1^2 + b4X2^2 + b5X1X2
That's what made me think you were saying that it DOES matter what the
v's are.
>
>
>
>>Linear combination -- a weighted sum of variables (v's) where:
>>
>>1. No v is a function of any other v,
>
>
> Says who?
I thought that's what you were suggesting. It was news to me.
>As a regression function Y on the independent variables
> v's, we have seen uncountable numbers of examples that the v's can
> be powers of themselves, nonlinear functions of themselves, and
> an product of any combination of them. All of those are perfectly
> good (and very common) examples of LINEAR MODELS (in regression)!
Great.
Earlier, you said, "In the other linear combination, if the b's are
functions of other b's, we simply have a linear model with CONSTRAINTS
on the estimated coefficients!" But now you are saying that in that
case, the model is not linear. You lost me somewhere back there.
>
> Bruce, you're making life difficult for yourself. Just keep it
> simple!
Hey, I always try to live by the KISS principle. I'm just trying to
reconcile some of your statements that *seem* a bit contradictory to me.
>
> The three definitions -- "linear combination", "linear model", and
> "linear function" are distinct and separate. The example I gave as
> to why it's perfectly okay for the X's to be products or functions of
> otther X's (as regression DATA), while not okay for the REGRESSION
> coefficients to be products of other coefficient is a good way to
> sort out the distinct and separate meanings of those terms.
>
> -- Reef Fish Bob.
>
Bob, I'm not trying to be difficult here. I just want to understand what
you are saying.
My apologies to everyone for the amount of quoted material. I didn't
want to be tarred and feathered for selective snipping. ;-)
I didn't see this post until after posting my followup questions.
Sorry for any confusion that causes.
[snip]
>
> I posted what's below on Nov 10, in explaining to schwail how he should
> proceed with his problem because of all the bad advice he got from RU:
>
[Bob goes, slap!]
Bob wrote his very nice piece on fitting a response
surface, which was suggested by the question of the OP.
However, it was aimed far over his head, and not responsive
either to the question, approximately,
"why is my coefficient near zero, no matter what I tried?"
- or to the eventual, deeper, problem, which was, in a
phrase, "gross errors in reading regression printouts."
I started by suggesting 'data problems' and ended by
documenting inconsistencies in what the OP reported in
response.
Bob never read my advice that way.
My last post in that thread can be found at -
http://groups.google.com/group/sci.stat.edu/msg/315db6c73995487b
Here is my post which I labeled "Bob Ling vs Normality",
http://groups.google.com/group/sci.stat.math/msg/12dfe550e4c3287e
where, eventually, I went into some detail about normality --
How to use it, and Why and When.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
It was far over Richard Ulrich's head, but hardly over ANYONE's
head by telling them that BEFORE fitting any model in regression,
it was a waste of time to check the DISTRIBUTIONS of the
independent variables. That was my follow-up to the OP sehwail's
post, giving what he did below:
sehwail> A. Normality
sehwail> I started with checking the normality of the IVs and DV and
sehwail> got the following numbers:
Variable Skewness Std Error Kurtosis St
error
X1 -.823 0.201 0.609 0.400
X2 0.041 0.201 -0.465 0.400
X3 -0.539 0.201 -0.259 0.400
X4 -0.956 0.201 0.994 0.400
Y -0.542 0.201 0.173 0.400
sehwail> For this data I concluded that the data is normal
Note that was how sehwail STARTED with his analysis.
Those checks were NEVER appropriate, before, during, or after
fitting a regression model with those variables X1,...X4.
This was my post telling sehwail WHY the above was completely
UNNECESARY, and then advised him on how he should have proceed:
> I started by suggesting 'data problems' and ended by
> documenting inconsistencies in what the OP reported in
> response.
>
> Bob never read my advice that way.
Because you had made the SAME blunder sehwail made!
>
> My last post in that thread can be found at -
> http://groups.google.com/group/sci.stat.edu/msg/315db6c73995487b
>
> Here is my post which I labeled "Bob Ling vs Normality",
> http://groups.google.com/group/sci.stat.math/msg/12dfe550e4c3287e
> where, eventually, I went into some detail about normality --
> How to use it, and Why and When.
The post BELOW is the one you should have cited:
instead of your later posts trying to excuse yourself for making the
same blunders. My preceding reply about your "outlier checking"
blunder should go with this too, in view of your persistent
obfuscation about your own errors (post on Jan 3, 2006):
-- Reef Fish Bob.
Bob
--
Bob O'Hara
Dept. of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: http://www.jnr-eeb.org
I had said in 2005 that until you answer the two specific questions I
asked of you, several times without an answer, you'll be ignored.
This will be your ONLY freebie in 2006. Your dumb question was
answered in the tinyurl post in my 2nd question, and it was also
in my reply to Bruce Weaver.
> >> If b3 = b1 * b2, then the model can no longer be written as a
> >> linear combinations of b1, b2, and b3, and it becomes a non-
> >> linear MODEL.
> >>
> Sorry for asking what looks like a dumb queation, but what if b3=b2*b2?
> Reef Fish: Am I write in thinking that you are saying that this is
> non-linear, and therefore the model above would not be a linear model?
> --
> Bob O'Hara
Anon Bob, all your questions are dumb but that's not the reason I'll
ignore you henceforth. It's the fact that you DID NOT READ what I
had already answered, and continue NOT TO READ the post I had
repeatedly pointed you to.
In my reply to Bruce Weaver, I had said of your example, originated
from Kendall and Stuart, which had been beaten to death,
RF> It MAY be considered as one with constraints in some special
RF> circumstances.
What you asked was one of the many special circumstances in the
tinyurl post I had referenced and Bob O'Hara chose not to read.
> Reef Fish: Am I write in thinking that you are saying that this is
> non-linear, and therefore the model above would not be a linear model?
I am saying your dumb question had been addressed a thousand
times -- it is a LINEAR MODEL with a constrain!
If you had read any of the specific posts and threads I've pointed
you to, you wouldn't still be asking the SAME DUMB QUESTION,
>From now on, you either go back and read what I had already answered,
or read whatever I answer questions by OTHERS,
Any question by Anon Bob O'Hara will be ignored.
-- Reef Fish Bob
So, when does the definition about being a linear combination apply, and
when doesn't it?
This is why I've been pressing you on this: the definition you state
seems at odds with your identification of models as linear or not.
The only consistent solution I can see is that a "LINEAR MODEL with a
constrain" is a non-linear model, but in the past you have denied that,
hence my confusion.
I hope you will explain this apparent contradiction: if you ignore my
request I will conclude that you have no explanation, and the
contradiction is real.
Bob
--
Bob O'Hara
Department of Mathematics and Statistics
P.O. Box 68 (Gustaf Hällströmin katu 2b)
FIN-00014 University of Helsinki
Finland
Telephone: +358-9-191 51479
Mobile: +358 50 599 0540
Fax: +358-9-191 51400
WWW: http://www.RNI.Helsinki.FI/~boh/
Journal of Negative Results - EEB: www.jnr-eeb.org
I don't mind differing with Bob on some questions.
RIght now, there's "normality" where Bob is screwed up on
the discussion, and "looking at the independent variables",
which has never actually finished.
For instance, from two places in this note, Bob seems to
endorse something I consider a weird posture, of -in my words-
considering "independent variables" in a regression to
be a Black Box whose univariate and bivariate distributions
can't be examined before the analysis, or -even- afterwards,
except as guided by formal regression diagnostics, after
some specified analysis. (It remains a mystery to me,
interpolating his opinion, *who* defines that analysis, based
on *what*.) This is a discussion which has not yet reached
a conclusion. I'll be happy to hear Bob re-state his position
so that I can understand it, if I'm missing it.
But it is terribly hard to find where we differ in some other
cases, when he keeps mis-stating whatever I write, ...
without ever quoting a word of it. And he seems to deny
that I have a right to disown *his* interpretation of me.
The rest of this long post is consists of the start of what Bob
just entered, a brief comment by me, plus (fairly short) posts 1-3
and the start of 4 in the relevant thread. I think the content of
mine does not justify Bob's reading of November 7 or of today.
Is Bob being disingenuous, or has he lost the start of the
thread? Three days after his first post, Sehwail reported on
the above, how he "started with his analysis." Bob Jumped on
me three days earlier, after the first post.
Here are the posts that started the thread at
http://groups.google.com/group/sci.stat.math/msg/92c8b7453f1a4072
- including my first post which Bob found so offensive, and
the first version of Bob's objection.
===== start of posts in Google-groups thread
1. sehwail
Nov 7 2005, 9:51 am
Newsgroups: sci.stat.math
From: sehwail <sehw...@hotmail.com>
Date: Mon, 07 Nov 2005 09:51:22 EST
Local: Mon, Nov 7 2005 9:51 am
Subject: Linearity- Multiple Regrression
I am working on a regression model with four independent variables and
one dependent variable.
To check linearity I graphed each independent variable vs. dependent
variable using scatter plots in SPSS and found the r-square. All of
the relationships have a r-square between 0.1-0.34
The problem is that when I graph X4 vs Y, I got R-square as 0.0003. I
tried x^2, Log X, Sqrt (x) but nothing could increase r-square to more
than 0.06 .... any comments?
Even when I remove X4 from the model, I still get the standard
deviation of the residual > std dependent ..... which indicates a
non-linearity problem but all the relationships X1,X2,X3 seems to be
linear with Y.
any comments. .... or recommendations?
2. Richard Ulrich
Nov 7 2005, 4:18 pm
Newsgroups: sci.stat.math
From: Richard Ulrich <Rich.Ulr...@comcast.net> - Find messages by this
author
Date: Mon, 07 Nov 2005 16:18:46 -0500
Local: Mon, Nov 7 2005 4:18 pm
Subject: Re: Linearity- Multiple Regrression
On Mon, 07 Nov 2005 09:51:22 EST, sehwail <sehw...@hotmail.com> wrote:
> I am working on a regression model with four independent variables
and one dependent variable.
> To check linearity I graphed each independent variable vs. dependent variable using scatter plots in SPSS and found the r-square. All of the relationships have a r-square between 0.1-0.34
One big purpose of the recommendation to "look at scatter
plots" is that you should *look* at the plots. What do you *see*?
Is there a curved relationship? If the variables are near-normal,
then the small-relationship graph looks like a slightly elliptical
swarm of points.
> The problem is that when I graph X4 vs Y, I got R-square as 0.0003. I tried x^2, Log X, Sqrt (x) but nothing could increase r-square to more than 0.06 .... any comments?
"There is no linear relationship" is not the same as
"The relationship is non-linear." Consider the possibility,
"There is no relationship."
> Even when I remove X4 from the model, I still get the standard deviation of the residual > std dependent ..... which indicates a non-linearity problem but all the relationships X1,X2,X3 seems to be linear with Y.
> any comments. .... or recommendations?
The residual increases when the variable added to the regression
yields a t-value less than 1.0; that is shown by simple algebra.
Again, "There is no relation" might be your answer.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
3. sehwail
Nov 7 2005, 4:44 pm
Newsgroups: sci.stat.math
From: sehwail <sehw...@hotmail.com>
Date: Mon, 07 Nov 2005 16:44:12 EST
Local: Mon, Nov 7 2005 4:44 pm
Subject: Re: Linearity- Multiple Regrression
Does "saying there is no linear relationship" violates the assumption
behind regression?
Am I suppose just to ignore that the this variable (x4) does not exist
and regress Y agianst the other three variables only?
When I did that, the standrad deviation for the residuals was 0.54 and
the standrard deviation for the dependent variablewas 0.468. So that
shows that there exists some non-linearity even with three variables
only. But when I looked at the scatter plot fot these three varaibles,
there R-square ranged between 0.1 and 0.3 (a modest linear
relationship) ... what am I suppose to do?
4. Reef Fish
Nov 7 2005, 9:53 pm
Newsgroups: sci.stat.math
From: "Reef Fish" <Large_Nassau_Grou...@Yahoo.com>
Date: 7 Nov 2005 18:53:30 -0800
Local: Mon, Nov 7 2005 9:53 pm
Subject: Re: Linearity- Multiple Regrression
sehwail wrote:
> Does "saying there is no linear relationship" violates the assumption behind regression?
Not at all. By "the assumption behind regression", one usually means
the PROBABILITY assumption about the ERRORS in a regression
model, regardless whether the functional relationship between Y and
the Xs is linear or nonlinear.
In that regard, Richard Ulrich gave you some erroneous advice about
what to look at in the scatter plots. There is NOTHING you can look
at in the scatter plots of your Y vs any of the Xs that is relevant to
the
"normality" issue of the errors. You can't look at any plot about
the
distribution assumption about the errors until AFTER you have
attempted some fit, and have the residuals to look at.
Plotting Y vs each of the X's is one of the items I called "Pitfalls
in
Multiple Regression Analysis", for the reasons I'll explain below.
[snip, rest of note and thread]
===== end of copying from Google-groups.
[ snip, rest of Bob's]
You were stabbing your own foot in the preceding paragraphs.
I left them in tact! Is that good enough for citing you?
There are NO distributional assumptions on the independent
vairables X.. So why should anyone check it for normality or
outliers or anything else that has to do with the DISTRIBUTIONS
of them.
Snip all Richard Ulrich's irrelevant and distorted rehash of what
anyone can read for themselves in the thread!
> > It was far over Richard Ulrich's head, but hardly over ANYONE's
> > head by telling them that BEFORE fitting any model in regression,
> > it was a waste of time to check the DISTRIBUTIONS of the
> > independent variables. That was my follow-up to the OP sehwail's
> > post, giving what he did below:
> >
> > sehwail> A. Normality
> > sehwail> I started with checking the normality of the IVs and DV and
> > sehwail> got the following numbers:
> > Variable Skewness Std Error Kurtosis St
> > error
> > X1 -.823 0.201 0.609 0.400
> > X2 0.041 0.201 -0.465 0.400
> > X3 -0.539 0.201 -0.259 0.400
> > X4 -0.956 0.201 0.994 0.400
> > Y -0.542 0.201 0.173 0.400
> > sehwail> For this data I concluded that the data is normal
> >
> > Note that was how sehwail STARTED with his analysis.
Exactly. It's a complete waste of time to do his normality check
at the start, during, or at the end of ANY of his, or any regression
analysis.
Richard Ulrich simply continue to wallow in his ignorance about
regression ASSUMPTIONS -- what can, what need to be checked.
The distribution of the data values in X is one of those that
can be ANYTHING.
> Is Bob being disingenuous, or has he lost the start of the
> thread?
YOU (Richard Ulrich) is the one who lost the start of the thread,
never understood ANYTHING in any of the related threads.
This discussion has branched into several threads, but the thread
which sehwail STARTED was the one you stated below:
> Here are the posts that started the thread at
> http://groups.google.com/group/sci.stat.math/msg/92c8b7453f1a4072
>
> - including my first post which Bob found so offensive, and
> the first version of Bob's objection.
>
> ===== start of posts in Google-groups thread
> 1. sehwail
> Nov 7 2005, 9:51 am
>
> Newsgroups: sci.stat.math
> From: sehwail <sehw...@hotmail.com>
> Date: Mon, 07 Nov 2005 09:51:22 EST
> Local: Mon, Nov 7 2005 9:51 am
> Subject: Linearity- Multiple Regrression
My Dec 7 post in THAT thread (the one above) should have been the end
of it. That was post No. 24 in the thread.
Richard's January 3, 2006 post was post No 25 of that thread,
rehashing
his dead horses -- that that was what I said in post No. 28, in the
original
sehwail thread to post No. 27 which had a changed subject name, to
reflect all the muddle, quackery,and malpractice of Richard Ulrich.
From: statm...@earthlink.net - Find messages by this author
Local: Wed, Jan 4 2006 8:58 am
Subject: Re: Rich Ulrich continues his statistical Muddle, Quackery,
and MALPRACTICE
Now Richard Ulrich continues his diverionary tactics and obfuscation by
using the SUBJECT which is NOT one of the SIX different subjects in
the Sehwail thread while saying,
> Is Bob being disingenuous, or has he lost the start of the thread?
I know exactly where the thread started, has been, and branched off
to other thread.
So, I am changing the SUBJECT back to the subthread in which the
post 24, 25, 26, through 28 appeared (where 28 was Richard
Ulrich's post of January 4, following my Dec 7 post No. 24.
Richard Ulrich is the one who continues to MALPRACTICE, and make
the SAME blunders that had been pointed out to him dozens of times.
The independent variable X in a multiple regression CAN be anyone
of these:
1. An indicator variable with values 0 or 1.
2. A discrete uniform distribution of ranks.
3. A distribution that came from Cauchy or other long tail
distributions
that would appear to have outliers (compared to "normal")
4. The distribution of an observed X can be severely bimodal,
trimodal, left skewed or right skewed ... and it short any
distribution that has ever seen observed in the entire history
of statistical distributions that are NOR NORMAL, can be
the distribution of the X used in any multiple regression.
So, why was sehwail and Richard Ulrich want to check the "normality"
or "outliers" of the the data distributions in the INDEPENDENT
variables X?
sehwail can be excused for not knowing better. He has been
silent ever since my ORIGINAL response to his Post No. 1 in
the thread.
Richard Ulrich is the one who continue to come back with his
original blunder of checking what is ABSOLUTELY unnecessary
and absolutely USELESS in doing sehwail's regression problem.
Why?
I've refrained from saying this for at least the past 100 posts of
this kind by Richard Ulrich:
Because: Richard Ulrich's knowledge about statistics is at the
kindergarten level. And Rich Ulrich's continued argument on
the unarguable only proved that he is an IDIOT. I can't think
of a more fitting term for Ulrich based on the vacuous substance
of his posts and being oblivious all ALL his statistical blunders.
Only an IDIOT would say that
Y = aX is a nonlinear model -- Richard Ulrich did, and the
only IDIOT in the annals of statistics who did.
His current non-recognition of the uselessness of checking for
the normality and outliers in X is worse than nonrecognition
of the linear model above.
I rest my case. Richard Ulrich has gone far beyond the
excusable founds of stupidity and ignorance.
-- Reef Fish.
I concur that there are no distributional assumptions on the independent
variables "for use in a regression".
However, I believe that it is crucial to examine the distributions of
ALL variables. Even "housekeeping" variables such as case IDs, record
numbers, etc.
Social scientists, accountants, and many statisticians are taught that
before you do anything with models or complex analysis you should do
several things for quality assurance. This is called cleaning,
preparing, and understanding your data. Before you go anywhere near a
regression (etc.) you should be sure that the variables and set of cases
adequately represent the phenomena they are alleged to.
(The challenger came down because the set of cases considered in the
analysis was incomplete.)
The first is that if you have original data that you (enter it twice and
compare the entries resolving conflicts) or (enter it once and proofread
it). People frequently do things such as transpose digits.
Then one should examine the univariate distribution of every variable to
be sure that it is a plausible representation of the construct it
purports to represent in the context in which the study is being done.
The distributions of variables should be plausible. All values of a
variable should be in the defined domain. Unless one is studying
hermaphrodism, Klinefelter's, or other chromosomal or developmental
anomalies there should only be two sexes. If depreciation and
appreciation are recorded in separate variables as they usually are in
IRS files there should be no negative values. If a number is supposed to
be the log of something its input should not be a negative value. Each
case should be entered only once in the data set. etc. There should be
no more items correct on a quiz than there are items (e.g., not 12 out
of 10). Values that are missing should have distinguishing values.
Then one should examine bivariate distributions to look for anomalies.
(which usually are data entry errors). Crosstabs, 2D scatterplots, etc.
are common tools in this. Is there an 85 pound six month old? Is there
someone with an 780 Math GRE and a 2.0 undergrad GPA?
Further exploration, understanding, and quality assurance is then done
with multiway crosstabs, 3D scatterplots, 3D scatterplots with different
color coding of data points, etc.
If the situation calls for it there are further multivariate methods
for detecting suspicious values which either have to be corrected or
failing that proven as outliers that need special handling such as
carrying an additional variable in a model to flag this case as special,
bringing the value up/down to some value closer to the center, trying
models including/excluding that case, etc.
Many statistical procedures then generate additional variables, scale
scores, predicted group membership, regression residuals, etc. The
distributions of these should also be critically examined.
Art
A...@DrKendall.org
Social Research Consultants
In the context of the present discussion, it was more than just "for
use
in a regression". It was about what sehwail actually DID that was an
exercise in futility, which was supported by Richard Ulrich. This was
the context of the actual problem. It was shown in the post of mine
to which your present post followed, but you apparently overlooked
both the CONTEXT and the SUBSTANCE of what was done below:
sehwail> A. Normality
sehwail> I started with checking the normality of the IVs and DV and
sehwail> got the following numbers:
Variable Skewness Std Error Kurtosis St
error
X1 -.823 0.201 0.609 0.400
X2 0.041 0.201 -0.465 0.400
X3 -0.539 0.201 -0.259 0.400
X4 -0.956 0.201 0.994 0.400
Y -0.542 0.201 0.173 0.400
sehwail> For this data I concluded that the data is normal
RF> Note that was how sehwail STARTED with his analysis.
RF> Those checks were NEVER appropriate, before, during, or after
RF> fitting a regression model with those variables X1,...X4.
RF> This was my post telling sehwail WHY the above was completely
RF> UNNECESARY, and then advised him on how he should have proceed:
> However, I believe that it is crucial to examine the distributions of
> ALL variables. Even "housekeeping" variables such as case IDs, record
> numbers, etc.
I am a data analyst. Examining data for recording (or other) errors is
routine practice. I even cited an example of Consumer Union data
in which more than 10% had "impossible" keypunched values, and
had to be discarded.
But you're preaching to the wrong chior in the wrong church.
Take a look at what sehwail did -- THAT's the context of the present
discussion -- that he tested the independent variables for normality.
Tell me why if you think it was NOT an exercise in futility and a
complete waste of time.
A more subtle error is that the observed Y's should NOT be checked
for normality, because they are NOT supposed to be from a single
normal distribution.
The rest of your comments are to a large extent valid one, but not
applicable to the actual analysis of the problem introduced by
sehwail.
-- Bob.
The current title embarrasses the entire newsgroup. If a new person
comes to the newsgroup, they see this thread title and think, "okay, I
see this is one of *those* kinds of groups." That isn't fair to them
or us.
Thanks!
John Uebersax
A mortal sin, no.
The end of civilization as we know it, no.
Thorough exploration of all variables should be done.
However, it should be done for quality assurance and understanding of
one's data.
An exercise in futility with regard to regression, yes.
A waste of time with regard to regression, yes.
Based on misunderstanding of regression, yes.
Based on misunderstanding of why thorough examination of variable
distributions is crucial, yes.
An exercise in futility for quality assurance and understanding of
one's data, no.
A waste of time for quality assurance and understanding of
one's data, no.
A mortal sin, no.
The end of civilization as we know it, no.
Art
That was precisely the problem of sehwail on which I based my
comments about the futility of his excercise below:
> > sehwail> A. Normality
> > sehwail> I started with checking the normality of the IVs and DV and
> > sehwail> got the following numbers:
> > Variable Skewness Std Error Kurtosis St
> > error
> > X1 -.823 0.201 0.609 0.400
> > X2 0.041 0.201 -0.465 0.400
> > X3 -0.539 0.201 -0.259 0.400
> > X4 -0.956 0.201 0.994 0.400
> > Y -0.542 0.201 0.173 0.400
> > sehwail> For this data I concluded that the data is normal
> >
> > RF> This was my post telling sehwail WHY the above was completely
> > RF> UNNECESARY, and then advised him on how he should have proceed:
> >
> > RF> http://tinyurl.com/dflc6
For the full text of my comments on how sehwail could have, or should
have,
proceeded, see the post referenced above.
-- Bob.