I am killing two birds (in a non-derogatory sense) with one stone
here, to bring the discussion back down to earth, the Real World
of science, statistics, and modeling via statistical methods.
Serendipitously, THIS subject fits as a PRE-REQUISITE discussion
subject to the "Linear regression models with CONSTRAINTS on the
parameters" thread, which is only beginning.
Jerry Dallal wrote:
Russell...@wdn.com wrote:
RM: Right, but I'm thinking more of general ideas than specific cases.
RM: How does one *know* (in the sense of inductive reasoning about the
RM: world, not some metaphysical certaintude) that the underlying
RM: process that generated the data is linear or nonlinear (however
RM: one defines those terms ;-) )?
This is a non-issue within the framework of the present discussion
of the DEFINITION of Linear Regression models.
Whether DATA one uses to model came from a linear or nonlinear (or
whatever) process is an entirely separate question in Science.
I have a reprint of this "must read" paper, for everyone:
Box, G.E.P. (1976), "Science and Statistics," JASA 71, 791-799
from which I can cite some relevant ideas and quotes. The article
was cited in my Encyclopedia of Statistical Sciences article on,
"Interactive Data Analysis," which is a way of putting one of
Box's idea into practice, that a statistician MUST be both a
SPONSOR of (one or more) models AND at the same time the CRITIC
of the same model s/he sponsors, before a choice can be made.
Box, Tukey, and I strongly agree on the notions of EXPLORATORY
data analysis, the iterative process of "model building", as well
as our views about Science and Statistics.
Russell expounded further:
"In the real world, there will be data, with errors, to which
models will be fit. Residuals will determined, and some
characteristics of the residuals (e.g. some summary statistics
such as MSE and probably a dozen more whose names escape me
at the moment or I've never even heard of) will be calculated.
Say the MSE's from both models are, for all intents and purposes,
equal? One can check other characteristics. Say half of these
favor the linear model, half the non-linear. One can even
compare the residuals point by point. Maybe there is no
answer to the question (at least that statistics can provide),
except perhaps acquire more data and hope it gives a more
definitive "answer". Maybe if the differences are statistically
indistiguishable then it really doesn't matter, but I think
it does because conclusions may be drawn and extrapolations
(both the dangerous mathematical kind and the more dangerous
intellectual kind) may be made. I suppose that the same
question really exists in determining which model to accept
no matter whether the differences between models are
linear vs. nonlinear or linear with N parameters vs. linear
with M parameters, etc. But it seems to me that the difference
between linear and nonlinear is in some sense more fundamental
than linear with N parameters vs. linear with M parameters,
but I may be wrong. Anyway, thanks for your reply. "
In Science, all models are TENTATIVE, and Box correctly emphasized
that "all models are wrong", and one should SELECTIVELY worry about
what's wrong.
The age of the universe change by billions of years (according to
different scientific theories) every decade or two during the past
century.
Now I turn to comment on Jerry Dallal's points.
JD "These are deep questions. I don't know whether there's a right
answer.
I think the best you can hope for is informed viewpoints. My simple
responses would be along the lines of noting that using statistics
to
determine the process that *generated* the data gets into cause and
effect, which is not a good place to go. "
That's putting it FAR too mildly. Statistical Quackery and Malpractice
occur every day when SOME "social scientists" and applied statisticians
who are black-magicians from all walks use purely observational data
(WITHOUT any experimental control) to draw "causal inference" from
correlational data; or to build "explanatory" models by using the
"expected SIGN fallacy/abuse in multiple regression". <These
abuses have been extensively discussed by me in this forum against
the malpractitioners. Don't argue them HERE. Just go to the
archives and pick up on those threads or start new threads on those
subjects.
JD: "I tell students that they can use statistical methods to
tell them where to look, but cause and effect is something
that has to be determined elsewhere. "
You should also point them to the Box article I cited.
JD: In the biological sciences, by uncovering mechanisms.
And even if the mechanism cannot be uncovered, there are at least
methodology in "designed experiments" to safeguard against some
of the defects of using observational data without any design.
For biological and medical "effects" of drugs, even carefully
controlled experiments may NOT be able to control some of the
uncontrollable interactions between INDIVIDUAL patients and
specific drugs!
But the valid statistical methods for "cause and effect" studies
are certainly much more scientific and logically sound and
credible than a "social scientist" doodling "path diagrams" on
their voodoo pads and claim causation established!
Jerry continued,
" But, suppose two models describe a set of data equally well,
which one to choose? One would have to design an experiment
to distinguish between them. Or, one might appeal to similar
but different situations. If there are two models that fit
the data equally well, then there's not much more that
statistical methods can do for you. In practice, one usually
uses Occam's razor, that is, choosing the simplest model "
This is one of the scientific aspects of "model building". This is
what
Box (JASA 1976, p.792) and others called the principle of "Parsimony".
Box correctly EMPHASIZED "Since all models are wrong the scientist
must ..." in the first sentence in each of subsection 2.3 "Parsimony"
and 2.4 "Worry Selectively".
Box (2.4, p. 792) "It is inappropriate to be concerned about mice
when there are tigers aboard."
Mathematicians and Mathematical Statisticians have a strong tendency
to worry about mice and miss the tiger, and the entire forest.
Box also had this very apt indictment against such mathematicians:
"Mathematistry is characterized by development of theory for
theory's sake, which since it seldom touches down with practice,
has a tendency to redefine a problem rather than solve it."
(Box 4.1, p. 797)
JD: consistent with the data. The "catch" is simplest. The model
JD: viewed as simplest can change depending on the subject matter
JD: knowledge of the research team.
Of course. That's SCIENCE, and statistics is merely a TOOL used in
scientific inquiry. Since Newton had his head hit by an apple,
Einstein took over temporarily in his relativity theories. These
were soon overthrown by quantum theories in neuclear physics of
quarks and other angels who danced on pins; not to mention that
Einstein, in the Princeton Institute "from 1933 to his death in 1955
tried UNSUCCESFULLY to construct a unified gravitational theory
and light that would treat both as different manifestations of
the same phenomenon." (1998 book by Paul Hoffman on Erdos),
and these angels had been driven off by the latest theories ...
Statistics and the PROPER USE of statistics for scientific inquiry
are still in the STONE AGES.
Just look around THIS newsgroup, and see how many are oblivious
about how Multiple Regression methods can be VALIDLY applied;
how many are practicing mathematistry; and how many are sustaining
on the principle "ignorance is bliss" -- and you'll see the sad
STATE of AFFAIRS in the field and practice of Statistics TODAY.
Jerry again,
"Until the sharper experiment is performed or additional subject
matter is brought to bear, there are two possibilities (models),
neither of which can be dismissed. But, this is what the
scientific method is all about--designing sharper experiments
(building more detailed models) as our knowledge increases so
that our knowledge can increase further still."
Well put, and fits right in Box's (1976) theme of
"Science and Statistics"
-- Bob.
Right, which was why I pulled it out and started a thread with a
different name. ;-)
>
> Whether DATA one uses to model came from a linear or nonlinear (or
> whatever) process is an entirely separate question in Science.
>
> I have a reprint of this "must read" paper, for everyone:
>
> Box, G.E.P. (1976), "Science and Statistics," JASA 71, 791-799
Thanks, I'll have to see if I can hunt that up if I ever get over
to the appropriate library (I'm not used to being on a campus with
multiple libraries as opposed to a main library and departmental
reading rooms, which we seem to have here, too.)
Absolutely, and the test comes with how, and especially how well,
one is selective. I'm asking questions about the fuzzy areas (or
at least areas I find fuzzy) because that's where it would seem
to be most critical. The obvious cases will be hopefully just
that.
rest snipped
Cheers,
Russell
> In Science, all models are TENTATIVE, and Box correctly emphasized
> that "all models are wrong", and one should SELECTIVELY worry about
> what's wrong.
>
>
Interesting idea. In some sense, the idea is traceable to Kuhn's concepts
he wrote out in "The Structure of Scientific Revolutions" in 1962.
Essentially, everything that we "know" is really a model that rests upon a
heart of that which be base the whole set of models upon, or "the
paradigm".
Of course, we "know" nothing, and all the models have something inherently
wrong with them (I'm thrilled, BTW, if an undergraduate can actually come
to terms with this). We can selectively worry about what's wrong, and thus
target and possibly improve certain aspects of the model. This is
everyday, run-of-the-mill science.
Kuhn goes one step further. When the outcome of the experiment is just so
absolutely different from what we would predict based on the current
paradigm, we are forced to throw out the entire paradigm, and form a new
one. This is the scientific revolution. Of course, since those who
maintain the body of knowledge are reticent to throw out everything they
hold dear, scientific revolutions are rare things. When these things come
about, this is when the models take a real jump.
Interestingly enough, the examples you cite (Newton and Einstein) can
arguably both be put at the center of scientific revolutions in the Kuhnian
sense.
If I had to put a difference on Box's outlook (at least as I understand it
from your thoughtful post), "selectively" worrying about what's wrong will
not bring about a scientific revolution, which can only come about when
very bright people start worrying about what's globally wrong.
Scott
Russell...@wdn.com wrote:
> Reef Fish wrote:
> > Whether DATA one uses to model came from a linear or nonlinear (or
> > whatever) process is an entirely separate question in Science.
> >
> > I have a reprint of this "must read" paper, for everyone:
> >
> > Box, G.E.P. (1976), "Science and Statistics," JASA 71, 791-799
>
> Thanks, I'll have to see if I can hunt that up if I ever get over
> to the appropriate library (
JASA (old journals) should be easily accessible, and some articles
may even be accessible on-line. If you can't locate a copy easily,
I'll see if I can scan a sopy and email it to you if you let me
know. My copy of a copy was used for my Lecture Notes on Data
Analysis as "hand out", with hand-written undersores on the most
important statements (for students whose attention span were less
than 1/20 of the 10 pages.) :-)
> > In Science, all models are TENTATIVE, and Box correctly emphasized
> > that "all models are wrong", and one should SELECTIVELY worry about
> > what's wrong.
>
> Absolutely, and the test comes with how, and especially how well,
> one is selective. I'm asking questions about the fuzzy areas (or
> at least areas I find fuzzy) because that's where it would seem
> to be most critical. The obvious cases will be hopefully just
> that.
George Box had said many things that fit right into your alley.
You'll enjoy every bit of his good sense as well as his articulation
of what Science and Statistics are all about!
Better 29 years late than never. :-)
-- Bob.
Scott Seidman wrote:
> "Reef Fish" <Large_Nass...@Yahoo.com> wrote in
> news:1118683313.7...@g43g2000cwa.googlegroups.com:
>
> > In Science, all models are TENTATIVE, and Box correctly emphasized
> > that "all models are wrong", and one should SELECTIVELY worry about
> > what's wrong.
> >
> >
>
> Interesting idea. In some sense, the idea is traceable to Kuhn's concepts
> he wrote out in "The Structure of Scientific Revolutions" in 1962.
> Essentially, everything that we "know" is really a model that rests upon a
> heart of that which be base the whole set of models upon, or "the
> paradigm".
I am sure the idea of what's behind the scientific method of inquiry
pre-dated Box or Kuhn or Tukey's (1962) classic paper of "The future
of data analysis", in the Annals of Mathematical Statistics, of nearly
100 pages of WORDS whereas a typical paper in the AMS at the time
was about 3 pages full of mathematical symbols and equations.
>
> Of course, we "know" nothing, and all the models have something inherently
> wrong with them (I'm thrilled, BTW, if an undergraduate can actually come
> to terms with this). We can selectively worry about what's wrong, and thus
> target and possibly improve certain aspects of the model. This is
> everyday, run-of-the-mill science.
>
> Kuhn goes one step further. When the outcome of the experiment is just so
> absolutely different from what we would predict based on the current
> paradigm, we are forced to throw out the entire paradigm, and form a new
> one. This is the scientific revolution.
There were many of such revolutions in areas of physics and mathematics
with which I am more familiar than most other branches of science.
> Of course, since those who
> maintain the body of knowledge are reticent to throw out everything they
> hold dear, scientific revolutions are rare things. When these things come
> about, this is when the models take a real jump.
The earth was the center of the universe when it was unthinkable how
god could have put earth anywhere else BUT the center of all things.
A few folks in the Middle Ages had been burnt on the stake for ideas
otherwise. Credible Rumor has it that the latest theory about the
universe put earth back to the center, in the mathematical sense
that in an infinite continuum every point can be considered the
center.
>
> Interestingly enough, the examples you cite (Newton and Einstein) can
> arguably both be put at the center of scientific revolutions in the Kuhnian
> sense.
Einsten's theories are really old, primitive theories now. Godel
pulled the rug under the mathematicians about the impossibility of
devising a system of mathematics that is completely logically
consistent within itself. Russell and Whitehead threw up their
hands after toiling for 20 years on the Ptincipia Mathematica;
and Herman and Jean Rubin wrote a little book about the equivalence
of dozens of major mathematical theorems throughout the centuries
as a result of Godel's rug pulling. :-)
>
> If I had to put a difference on Box's outlook (at least as I understand it
> from your thoughtful post), "selectively" worrying about what's wrong will
> not bring about a scientific revolution, which can only come about when
> very bright people start worrying about what's globally wrong.
But Rome wasn't built in one day. Many scientific progresses are
made in taking small steps correctly and in the ocrrect direction!
In fact, many revolutionary discoveries are from ACCIDENTS, such
as that by Mme Curie and other scientists who happened to have
been at the right place at the right time in what they found.
I am glad you found almost complete accord with Box's view about
science, (even if not about statistics, of which you didn't say
much. :-))
-- Bob.
>
> Scott
>
> But Rome wasn't built in one day. Many scientific progresses are
> made in taking small steps correctly and in the ocrrect direction!
> In fact, many revolutionary discoveries are from ACCIDENTS, such
> as that by Mme Curie and other scientists who happened to have
> been at the right place at the right time in what they found.
>
Agreed. The paradigm failures are often gradual, with many small model
failures across many scientists and studies, eventually causing one
particularly bright scientist (or natural philosopher) to put the whole
shebang together into a new paradigm. The accident thing is, when you
think about it, is largely a required situation for a scientific
revolution. Either your observations are inside the paradigm, or they
are outside. I've never heard of a scientist purposefully design an
experiment to test the paradigm-- that is until the paradigm was already
on its way down.
> I am glad you found almost complete accord with Box's view about
> science, (even if not about statistics, of which you didn't say
> much. :-))
History of science is sort of a hobby of mine. You spoke of special
mentors earlier, and I think I was exposed to one of those, which
resulted in my filling most of my meager humanities requirements within
my engineering program with History of Science coursework. The first
course dragged me in. It was a entitled "The Natural and the
Artificial:The History of Man-Made Man". As I learned, right up until
Darwin (a special sort of scientific revolution), science could largely
be defined by how man viewed himself with respect to God. Thus, one
could learn a whole bunch about science by studying cases in literature
in which man created other men, all while discussing and reading the
science philosophy of the time. We went all the way from Rabbi Loeb's
rendition of the Golem, to a 1980's movie, "Colossus: The Forbin
Project", with wonderful things like neoplatonism, Mary Shelley and H.G.
Wells in between. I was just hooked.
As far as stats goes, we're probably a lot closer in outlook than you
think. I use stats as a tool in neuroscience, and I learned linear
models from a very practical point of view-- from an Operations Research
Department, no less. Surrounded by young bank vice presidents who were
trying frantically to understand the course work, the prof (who actually
came up with the most reliable model of that managed brassiere
distribution all of South Africa) pulled me aside one evening and tried
to woo me into the department. Had I been smart, I would have gone over.
In the meantime, I'm the guy who flinches just about every time I see a
neuroscientist incorrectly apply a regression, or ignoring the obvious
correlation between residuals and inputs, or stumble around trying to
quantify a difference in slope because they don't know what a dummy
variable is. It's been a while since I've taken all my formal
coursework, though, and more importantly, I've never had to teach stats
(and hopefully never will).
Scott
It's available through JSTOR, if your institution has access.
--
Bruce Weaver
bwe...@lakeheadu.ca
www.angelfire.com/wv/bwhomedir
Bruce,
Thanks for the JSTOR reference I hope Russell can access.
I am not familiar with this resource probably because it has only
about 180 college subscribers.
I was thinking of the on-line JASA access by ASA members.
I vaguely recall the latest status is something like the most
RECENT issues are accessible only by subscription by ASA members,
but the older JASA journals and articles are available to any
ACTIVE members of the ASA.
Does someone have the up-to-date information about this?
I am a Life Member of the ASA (not an honor but a designation
for anyone who donated $3000 to the ASA building funds long
ago). As a Life Member, I get "free" memberships annually
(which I would have cancelled otherwise) as well as a
permanent flow of current issues of JASA, AMstat News, and
the American Statistician (which complements my other junk
mail :-)).
But for an occasional useful reference such as the Box
article, it would be nice if some ASA member who is familiar
with the archive retrival of ASA papers to post some tips on
how that can be done.
-- Bob.
Thanks, Bruce, I imagine it's around Cornell someplace, but I'm
still getting settled in.
Cheers,
Russell
Scott Seidman wrote:
> "Reef Fish" <Large_Nass...@Yahoo.com> wrote in
> news:1118691585.9...@g14g2000cwa.googlegroups.com:
>
>
> > I am glad you found almost complete accord with Box's view about
> > science, (even if not about statistics, of which you didn't say
> > much. :-))
>
> History of science is sort of a hobby of mine. You spoke of special
> mentors earlier, and I think I was exposed to one of those, which
> resulted in my filling most of my meager humanities requirements within
> my engineering program with History of Science coursework.
< snip >
>
> As far as stats goes, we're probably a lot closer in outlook than you
> think. I use stats as a tool in neuroscience, and I learned linear
> models from a very practical point of view-- from an Operations Research
> Department, no less.
You might have fared better than what you've shown in your arguments
against me on the subject of Linear Regression models had you taken
a course on the subject of regression from a statistician in a
competent department of statistics.
< snip >
> It's been a while since I've taken all my formal
> coursework, though, and more importantly, I've never had to teach stats
> (and hopefully never will).
Scott, given your autobiographical sketch about your background in
statistics, I tend to be sympathetic about the faux pas and
misdemeanors you've made in the Linear Regression models threads.
In the future parade of those who chose to hang themselves by the
ropes I offered, I'll probably even leave you out of that parade.
I trust by the time you've read my yet-to-appear Part II on
Linear models with RESTRICTIONS, you would have understood the
lack of substance of "science" in the ONE single model in
Kendall and Stuart which aroused more NOISE MAKERS than Chinese
folks celebrating the Chinese New Year, and that you would have
a much better understanding about the standard meaning of a
Linear Model as well as a non-linear model, in Statistics.
In any event, I am comforted to hear your self-confession :-)
Scott> I've never had to teach stats (and hopefully never will).
I think we'll get along marvellously discussing SCIENCE, and
your hobby, the history of science. :)
-- Bob.
Bob introduces the word "purely" here, for the first time,
which changes the meaning to something where he can be right.
I was describing knowledgeable model-building, from the start.
> (WITHOUT any experimental control) to draw "causal inference" from
> correlational data; or to build "explanatory" models by using the
> "expected SIGN fallacy/abuse in multiple regression". <These
> abuses have been extensively discussed by me in this forum against
> the malpractitioners. Don't argue them HERE. Just go to the
> archives and pick up on those threads or start new threads on those
> subjects.
[snip, rest of Bob's post]
I won't argue them here. I think Bob and I are done --
we seem to be discussing disjunct subjects.
I just point once more to Bob's bad habits. The archives
must show Bob failing to *discuss*, again and again.
There are paired monologues.
My posts repeated seek engagement in both examples and
principle. And, compromise in the form of criticism of
"bad examples" Bob's posts eventually display the unpleasant
side of his temper, especially after I applauded his Tukey-Mosteller
citations and other generalities about being careful.
Perhaps JD will want to defend his own web page, which
was the starting point --
From April 17,
===== from Bob's post
> Jerry Dallal gave a simple expository explanation of these concepts
> of "leverage points" and "influential points" in the link:
> http://www.tufts.edu/~gdallal/苓iagnose.htm
Having endorsed Jerry's expository explanation, I found a disturbing
statement which I feel obliged to explain my complete disagreement:
JD> Belsley, Kuh, and Welch. Roy Welch tells of getting interested
JD> in regression diagnostics when he was once asked to fit models
JD> to some banking data. When he presented his results to his
clients,
JD> they remarked that the model could not be right because the sign
JD> of one of the predictors was different from what they expected.
The "expected sign" of a (multiple) regression coefficient is the one
single ERROR most often committed by social scientists and economist
in their interpretation of regression coefficients.
Over the years, I have not found a SINGLE CASE in which a
justification was given (nor hinted) on where the "expectation"
of the expected sign came from.
==== end of quote from Bob's.
re: the last paragraph above -
Bob has failed to say, about 8 times now, why the example
of "epidemiology" does not meet his requirement, or what
his opinion of epidemiology is. I suppose his revision to
"purely observational" could be an attempt to get around that.
--
Rich Ulrich, wpi...@pitt.edu
http://www.pitt.edu/~wpilib/index.html
Richard Ulrich wrote:
> On 13 Jun 2005 10:21:53 -0700, "Reef Fish"
> <Large_Nass...@Yahoo.com> wrote:
> [snip, a bunch]
> >
> > Now I turn to comment on Jerry Dallal's points.
> >
> > JD [..." My simple responses would be along the lines of noting that
> using statistics to determine the process that *generated* the data
> gets into cause and effect, which is not a good place to go. "]
> RF >
> > That's putting it FAR too mildly. Statistical Quackery and Malpractice
> > occur every day when SOME "social scientists" and applied statisticians
> > who are black-magicians from all walks use purely observational data
>
> Bob introduces the word "purely" here, for the first time,
> which changes the meaning to something where he can be right.
The word "purely" simply emphasized the fact that it was in the
ABSENCE of any "designed experiment", so it means exactly the same
as "observational studies".
The "WITHOUT experimental control" was explicitly pointed out below.
>
> > (WITHOUT any experimental control) to draw "causal inference" from
> > correlational data; or to build "explanatory" models by using the
> > "expected SIGN fallacy/abuse in multiple regression". <These
> > abuses have been extensively discussed by me in this forum against
> > the malpractitioners. Don't argue them HERE. Just go to the
> > archives and pick up on those threads or start new threads on those
> > subjects.
> [snip, rest of Bob's post]
>
> I won't argue them here. I think Bob and I are done --
> we seem to be discussing disjunct subjects.
But they you continue to argue below!!!
>
< Richard's characteristic ad hominem attack without reference to
the actual STATISTICAL issues snipped >
>
> Perhaps JD will want to defend his own web page, which
> was the starting point --
>
> From April 17,
> ===== from Bob's post
> > Jerry Dallal gave a simple expository explanation of these concepts
> > of "leverage points" and "influential points" in the link:
>
>
> > http://www.tufts.edu/~gdallal/diagnose.htm
>
Perhaps Jerry Dallal, given this "expected sign FALLACY" pointed
out by me, in what he cited about the Belsley, Kyh, and Welsch
book, will MODIFY his page in the light of the extended discussion
of WHY one of the authors erred". I believe it was Belsley,
because Belsley had another book he was trying to get publisehd
by Wiley, on the same subject.
That was about 15 years ago. I have from a good source (a reviewer)
who strongly recommended AGAINST its publication for the same
error he committed, as well as other statistical heresy Belsley was
pushing. That manuscript/book was never published, AFAIK. Kudos
to Wiley and its editor for the manuscript.
>
> Having endorsed Jerry's expository explanation, I found a disturbing
> statement which I feel obliged to explain my complete disagreement:
>
> JD> Belsley, Kuh, and Welch. Roy Welch tells of getting interested
> JD> in regression diagnostics when he was once asked to fit models
> JD> to some banking data. When he presented his results to his
> clients,
> JD> they remarked that the model could not be right because the sign
> JD> of one of the predictors was different from what they expected.
That quotation alone only showed that the BANKER was wrong. BUt
the expected SIGN fallacy is quite independent of the subject of
"regression diagnostics" in which I have directed three doctoral
dissertations and know quite a bit about the subject.
>
>
> The "expected sign" of a (multiple) regression coefficient is the one
> single ERROR most often committed by social scientists and economist
> in their interpretation of regression coefficients.
>
>
> Over the years, I have not found a SINGLE CASE in which a
> justification was given (nor hinted) on where the "expectation"
> of the expected sign came from.
>
> ==== end of quote from Bob's.
> re: the last paragraph above -
That statement stands. "where the expected sign came from"
referred to the TECHNICAL justification, from the point of view
of partial correlations (vs the error of simple correlation
MIS-interpretation which was obviously how the BANKER and
others erred).
> Bob has failed to say, about 8 times now, why the example
> of "epidemiology" does not meet his requirement, or what
> his opinion of epidemiology is.
For the n-th time, your question is ENTIRELY IRRELEVANT to the
statistical errors YOU made YOURSELF, entirely contained in the
archives of this newsgroup, and entirely independent of your
epidemiology red-herring.
> I suppose his revision to
> "purely observational" could be an attempt to get around that.
You put entirely too much Freudian interpretation of your own
illusion of grandeur. You are a NOBODY (an incompetent
mal-practitioner of statistics). Other than exposing YOUR
specific ERRORS, nothing I write was catered to YOU, and the
history of your posts has shown your own PARANOIA resulted
from your Illusion of Gradeur.
Get yourself EDUCSTED in Statistics, Richard Ulrich.
Stop wasting bandwidth trying to salvage your unsalvageable
ill-repute, established soundly and firmly by YOURSELF.
-- Bob.
Does somebody want to tell me what's going on?
In 1988, Roy Welch offered a Short course in Regression Diagnostics at
the Biometric Society ENAR meeting. He began it with the anecdote I
relate on my web page that is quoted here. I was in the audience.
--Jerry
I certainly hope Roy wasn't condoning the Banker's "expected sign"
fallacy without giving a statistical argument on "partial
correlation" in order to argue why that coefficient was "wrong".
Roy could be using that (without going into the "expected sign"
issue) as an introduction to the theme in regression diagnostic
that certain high leverage points could have tremendous ACTUAL
influence on ALL signs of a regression coefficient if that point
were not there.
In any event, whatever Roy's intended point was, you should go
to Mosteller and Tukey (or my cited references of the specifics
there) to read WHY the "expected sign" is a fallacy in multiple
regression in the ABSENCE of justification of the FORM and
SUBSTANCE of all the OTHER variables, or the justification of
the SIGN of the partial correlation, BEFORE or AFTER the model
had been fitted with DATA.
That's the crux of the issue.
The simplest explanation of the actual story was that the BANKER
looked at the sign and FALSELY and INVALIDLY concluded that the
sign should be the same as the sign of the SIMPLY correlation
between that variable and the dependent variable. The banker
wouldn't KNOW what a partial correlation is, let alone REASON
from a partial correlation angle that a sign is right or wrong.
That's precisely the "expected SIGN" fallacy -- confusing the
sign of a partial correlation with the "expected sign" of a
simple correlation (which anyone can easily reason correctly --
but the simple correlation sign is the WRONG sign to expect
in a multiple regression!
That's the bottom line, Jerry.
It's entirely unnecessary to bring Roy or Belsley, Huh, or
Welsch into the act. The issue is clear cut and unequivocal,
from the THEORY of Multiple Linear Regression analysis and
the mathematical form of each of the estimated coefficients
to contain very specific "partial correlation" information,
and NOT any simple correlation coefficient sign in the
estimated coefficient.
Specifically, for SIMPLE regression, the slope estimate
can be written as
b = r (sy/sx)
where r is the SIMPLE correlation between x and y, and sy and sx
are the standard deviations of y and x respectively.
In ANY multiple regression, the FORM of the estimated coefficient
is EXACTLY the same,
bi = r(xi,y|all xj, j ne i) sy*/sxi*
where r(xi,y|all xj, j ne i) is the partial correlation between xi
and y, given all the other xjs in multiple regression equation.
the sy* and sxi* are the partial standard deviations of y and xi
respectively, given all the other xj's.
FINALLY, you can actually COMPUTE the appropriate residuals, and do
a SIMPLE regression between the two sets of RESIDUALS to get
precisely the multiple regression coefficient for any Xi.
I was thrilled the first time I learned/discovered this. I am
still thrilled at the SIMPLICITY of the theory 3 dacades later!
That's the way it should be TAUGHT in the first place -- and it's
the way I taught it. I even had more than one post in this ng
detailing that THEORETICSL result that can easily be empirically
verified.
That's COMPREHENSION, if you really want to know what's behind a
"partial correlation" and what made up the estimated multiple
regression coefficients!
Standard and BASIC stuff not may statisticians know. <sigh>
That is the April post in which I explained the above together
with the fact that ALL coorelations (Multiple, Partial, or Simple)
CAN be viewed as SIMPLE correlations, provided you know to od it!
Here's a follow-up of the preceding post,
which was a "conceptual exercise" for any reader to test if they
understood the partial correlation concept and its relation to
multiple regression.
In particular, if you have a regression package that can do NOTHING
except a SIMPLE regression, the theory says (and can be verified
empirically -- as MY students had to do) that you can perform ANY
Multiple Regression with this simple-regression machine/package --
so long as you can keep track of all the variables and residuals
for all the necessary simple regressions.
Very tedious way of doing ANY multiple regression, but a very simple
CONCEPTUAL scheme!! A tryly GOOD pyramid scheme of building
higher order partial correlations from their lower-order
counterparts!
Jerry, I suspect you knew some of these (probably not all) from some
of your "multiple regression" material, if not from Frank Anscombe's
papers on rediduals. The two posts I cited above is completely self-
sufficient, in terms of relating ALL correlations to simple
correlations AND, above all, know WHY the multiple regression
coefficients DO NOT reflect the signs of SIMPLE correlations, but
PARTIAL correlations.
The EMPIRICSAL verification of the theoretical results can be easily
done (on two or three indepdent bvariables) on ANY statistical package
with multiple regression capabilities as well as the ability to save
residuals and use them as data for simple regressions.
This lecture is free.
But a reader will have to learn, understand, and absorb it, before
they lessons are valuable.
ALL "expected SIGN" abusers should study the material carefully and
understand the difference between the signs of simple and partial
correlations!
This post will be PERMANENTLY in the archives of groups.google.com,
should I need to refer Roy or anyone else to it. :-)
-- Bob.
> I certainly hope Roy wasn't condoning the Banker's "expected sign"
> fallacy without giving a statistical argument on "partial
> correlation" in order to argue why that coefficient was "wrong".
>
> Roy could be using that (without going into the "expected sign"
> issue) as an introduction to the theme in regression diagnostic
> that certain high leverage points could have tremendous ACTUAL
> influence on ALL signs of a regression coefficient if that point
> were not there.
>
> In any event, whatever Roy's intended point was, you should go
> to Mosteller and Tukey (or my cited references of the specifics
> there) to read WHY the "expected sign" is a fallacy in multiple
> regression in the ABSENCE of justification of the FORM and
> SUBSTANCE of all the OTHER variables, or the justification of
> the SIGN of the partial correlation, BEFORE or AFTER the model
> had been fitted with DATA.
I know the chapter. They don't say you can't do it. (Neither do you.)
They say you've got to be careful to take the other predictors into
account. I'm at home and my book is in the office. I've forgotten the
Tukey-ism. Something like "co-factor set".
As for Roy Welch, you'll have to take that up with him. I was at his
short course and that's how he opened it. The point was to impress us
with the importance of diagnostics and how he got involved studying
them. The message I got was that a banker had rejected an analysis Roy
had performed because it did not meet the banker's expectation and when
Roy looked closely at the data he saw that the banker was correct.
I don't see that the story is inconsistent with what you're saying. It
wasn't some yahoo making throwaway comments. It was a banker, with the
good sense to consult with Roy, looking at financial data. That is, it
was someone who was likely to be capable of appreciating "the form and
substance of the other variables".
--Jerry
Jerry Dallal wrote:
> Reef Fish wrote:
>
> > I certainly hope Roy wasn't condoning the Banker's "expected sign"
> > fallacy without giving a statistical argument on "partial
> > correlation" in order to argue why that coefficient was "wrong".
> >
> > Roy could be using that (without going into the "expected sign"
> > issue) as an introduction to the theme in regression diagnostic
> > that certain high leverage points could have tremendous ACTUAL
> > influence on ALL signs of a regression coefficient if that point
> > were not there.
> >
> > In any event, whatever Roy's intended point was, you should go
> > to Mosteller and Tukey (or my cited references of the specifics
> > there) to read WHY the "expected sign" is a fallacy in multiple
> > regression in the ABSENCE of justification of the FORM and
> > SUBSTANCE of all the OTHER variables, or the justification of
> > the SIGN of the partial correlation, BEFORE or AFTER the model
> > had been fitted with DATA.
>
> I know the chapter. They don't say you can't do it. (Neither do you.)
> They say you've got to be careful to take the other predictors into
> account. I'm at home and my book is in the office. I've forgotten the
> Tukey-ism. Something like "co-factor set".
More than that! Careful is one thing, arguing a prior, BEFORE seeing
any data, what the sign is; or much worse (argued and published by
some) when they don't even KNOW what the other variables in the
equation are -- THAT's the kind of abuse I am talking about. It's
theoretically and practically UNTANABLE in the latter case; and
has to be argued from the partial correlation point of view in the
"careful" part -- otherwise, don't "expect" any sign -- let the
DATA dictate what sign it has, and THEN you can examine carefully
(again from a partial correlation point of view whether the
sign, different from that of a simple correlation, is reasonable
or not.
>
> As for Roy Welch, you'll have to take that up with him. I was at his
> short course and that's how he opened it. The point was to impress us
> with the importance of diagnostics and how he got involved studying
> them. The message I got was that a banker had rejected an analysis Roy
> had performed because it did not meet the banker's expectation and when
> Roy looked closely at the data he saw that the banker was correct.
If you had expressed it that way, it would not have immediately raised
the unmistakable red-flag. The above would have been quite
reasonable,
and may even be perfectly reasonable, had YOU not said,
JD> they remarked that the model could not be right because the sign
JD> of one of the predictors was different from what they expected.
That is an UNMISTABLE indication that the banker had mistaken the sign
to reflect the sign of the SIMPLE correlation.
Read my posts on partial correlation carefully, and you'll appreciate
WHY even GOD could not have made that statement "that the model could
not be possibly right because the sign was ..." Of course it COULD
be right, given the other variables AND the actual data -- that's
point!
>
> I don't see that the story is inconsistent with what you're saying. It
> wasn't some yahoo making throwaway comments. It was a banker, with the
> good sense to consult with Roy, looking at financial data. That is, it
> was someone who was likely to be capable of appreciating "the form and
> substance of the other variables".
Yahoo or banker, had it NOT been the statement of the banker about the
attributed by YOU, there wouldn't have been any problem -- as soon
as the sole given reason was the SIGN, as stated by you, it's beyond
a shadow of a doubt that the "expected sign fallacy" had been
committed.
>
> --Jerry
Forget about the story. Forget about Roy. Forget about your
inconsistent description in the present post from what was cited on
your webpage.
Study my posts on PARTIAL CORRELATOINS and how they relate to the signs
of multiple regression coefficients. Go through this mental exercise:
Let Y be the GPA of a students at Podunk U at the end of their 2nd
year.
Let these be the independent variables in a multiple regression on
the data of a random (OR stratified) sample of 1,000 students.
It has been consistently observed, as well as reasonable to expect,
that the SIMPLE correlation between these variables and the GPA
variable to be positively related, probably significantly positively
related (in the sense of rejecting Ho: rho = 0, as discussed in
a different thread) because all it would take is for the sample
correlation to be approx. greater than 2/sqrt(1000) or r > 0.06.
Let the independent variables be:
1. The student's SAT Math score in the Senior yr in high school
2. The student's SAT Verbal score in the Senior yr in HS
3. The student's GPA in HS
4. The student's Math Achievement Score in the Sr yr in HS
When you do a multiple regression of the Y (GPA at the end of
Sophomore year) on these 4 variables, do (can) you expect the
signs of the regression coefficients to be ALL positive?
Can you expect the sign <as the banker did> on ANY of these four
variables in the fitted multiple regression equation? If so,
tell us why.
Hint: You CSNNOT validly argue (even if you knew all about partial
correlations and their relation to the regression coefficients
<as I do> that the model cannot possibly be right if one of the
signs is not what you expect <no matter WHAT sign you expect>.
After you've gone through this mental exercise, AND studying the
theory behind partial correlations, you'll be in an infinitely
better position to appreciate the folly of the "expected sign"
fallacy and common abuse.
-- Bob.
How so, Bob? There is nothing in the anecdote to suggest that the
bankers had seen any simple linear regressions or correlations. Nor is
there any indication, for that matter, about whether the data were
observational or experimental. Who's to say that their expectation was
not based on expertise in the field, and a wealth of experience with the
variables in question?
Cheers,
Bruce
>>As for Roy Welch, you'll have to take that up with him. I was at his
>>short course and that's how he opened it. The point was to impress us
>>with the importance of diagnostics and how he got involved studying
>>them. The message I got was that a banker had rejected an analysis Roy
>>had performed because it did not meet the banker's expectation and when
>>Roy looked closely at the data he saw that the banker was correct.
>
>
> If you had expressed it that way, it would not have immediately raised
> the unmistakable red-flag. The above would have been quite
> reasonable,
> and may even be perfectly reasonable, had YOU not said,
>
> JD> they remarked that the model could not be right because the sign
> JD> of one of the predictors was different from what they expected.
>
> That is an UNMISTABLE indication that the banker had mistaken the sign
> to reflect the sign of the SIMPLE correlation.
>
> Read my posts on partial correlation carefully, and you'll appreciate
> WHY even GOD could not have made that statement "that the model could
> not be possibly right because the sign was ..." Of course it COULD
> be right, given the other variables AND the actual data -- that's
> point!
>
Sorry, Bob. It's been a long day. My web page is correct and is what
my contemporaneous notes show. What Roy related as specifically not
meeting with the banker's expectation *was* the sign of a regression
coefficient. As I said, you'll have to take it up with Roy. That's
what he said in '88 and that's what I wrote down.
>
>>I don't see that the story is inconsistent with what you're saying. It
>>wasn't some yahoo making throwaway comments. It was a banker, with the
>>good sense to consult with Roy, looking at financial data. That is, it
>>was someone who was likely to be capable of appreciating "the form and
>>substance of the other variables".
>
>
> Yahoo or banker, had it NOT been the statement of the banker about the
> attributed by YOU, there wouldn't have been any problem -- as soon
> as the sole given reason was the SIGN, as stated by you, it's beyond
> a shadow of a doubt that the "expected sign fallacy" had been
> committed.
>
I couldn't see into Roy Welch's or the banker's head, but I would
understand the comment to mean "given their informed
judgment/expectation in light of the other variables in the model".
While I wouldn't bet the ranch on the sign of a regression coefficient,
the more experience was gained with a particular type of situation, the
more I would "expect" some things to hold. For example, in any adult
population, I would expect the correlation between height and weight to
be positive, both with an without adjusting for an indicator variable
denoting sex. If I found a sample for which this wasn't the case, it
would definitely give me pause. As other variables get added to the
model, my sense about what happens to the signs of various (partial)
correlations becomes less certain, but if I were to fit the same model
in a wide variety of situations, I might come to expect certain things.
I think this is consistent with what Mosteller, Tukey, Box, and you are
saying about model building and how it is important to understand how
things change depending on the cofactor set (if that's the right term).
It's the scientific method at work. We hypothesize and then design
experiments to test the hypotheses. There's nothing in the method that
forbids the use of observational data as long as the experiments are
properly designed, analyzed, and *interpreted*. It's the interpretation
where things usually fall apart, but that's the fault of the
interpreters for not realizing what their data allow them to say. It's
not the method itself that's at fault.
Something a little more applied. If in a wide variety of settings, I
see that some nutrient is related to cholesterol levels and that the
sign stays the same no matter what other relevant variables I adjust
for, I come to expect it and am surprised when the sign changes in a
particular ethnic population. There are many reasons why I might be
surprised. The first two that come to mind are genetics or that perhaps
there's something else going on such as another relevant variable, not
in my cofactor set, that matters in this setting.
On the other hand, maybe the sign is always the same. Some would stop
there (failing to recognize that cross sectional data do not tell us
what will happen to an individual over time!). I'd say that we now have
some evidence for going ahead with a randomized trial.
I *think* what we're discussing here is intelligent model building as
opposed to throwing everything into a multiple regression package to see
what pops out. I'm talking about the former. The latter is worse than
useless because others will act on it if it suits their agenda. Better,
in that case, that it not be done.
But it's easy to CORRECTLY expect the signs of simple regressions
between many pairs of variables. If Y is cost of gasoline and X is
the number of miles driven between fills, I or anyone else would
expect the correlation to be positive. That's why it's so EASY
to think about SIMPLE correlations.
There was no hint or indication that the banker thought of the
partial correlations which are crucial to thinking about
expected signs in multiple regressions.
> Nor is
> there any indication, for that matter, about whether the data were
> observational or experimental.
That is irrelevant for the "expected sign" phenomenon.
> Who's to say that their expectation was
> not based on expertise in the field, and a wealth of experience with the
> variables in question?
Try the conceptual exercise I suggested to Jerry, and you'll perhaps
see why. I know as much about those variables (in many admission
studies) as anyone. I had a wealth of experience with those variables
in real data. I CANNOT say what SIGN I can expect, with the variables
and context given. I certainly CANNOT validly say if one of the
signs is negative then the model must be wrong.
Can you, or Jerry, or ANYONE?
My INABILITY to "expect signs" is based on a combination of VAST
experience with BOTH theory and practice on that problem and those
variables.
-- Bob.
On 14 Jun 2005 19:10:23 -0700, "Reef Fish"
<Large_Nass...@Yahoo.com> wrote:
>
>
> Jerry Dallal wrote:
> > Reef Fish wrote:
> >
> > > I certainly hope Roy wasn't condoning the Banker's "expected sign"
> > > fallacy without giving a statistical argument on "partial
> > > correlation" in order to argue why that coefficient was "wrong".
> > >
> > > Roy could be using that (without going into the "expected sign"
> > > issue) as an introduction to the theme in regression diagnostic
> > > that certain high leverage points could have tremendous ACTUAL
> > > influence on ALL signs of a regression coefficient if that point
> > > were not there.
> > >
> > > In any event, whatever Roy's intended point was, you should go
> > > to Mosteller and Tukey (or my cited references of the specifics
> > > there) to read WHY the "expected sign" is a fallacy in multiple
> > > regression in the ABSENCE of justification of the FORM and
> > > SUBSTANCE of all the OTHER variables, or the justification of
- yes, I've essentially insisted on "justification of the form
and substance of all the other variables" - Read my posts?
> > > the SIGN of the partial correlation, BEFORE or AFTER the model
> > > had been fitted with DATA.
JD > >
> > I know the chapter. They don't say you can't do it. (Neither do you.)
> > They say you've got to be careful to take the other predictors into
> > account. I'm at home and my book is in the office. I've forgotten the
> > Tukey-ism. Something like "co-factor set".
RF>
> More than that! Careful is one thing, arguing a prior, BEFORE seeing
> any data, what the sign is;
I guess Bob still doesn't like Jerry's banker-example; or my
epidemiology example.
> or much worse (argued and published by
> some) when they don't even KNOW what the other variables in the
> equation are -- THAT's the kind of abuse I am talking about.
So *that* is the straw-man that Bob has laid on me.
Bob reads badly, and that's the truth of it.
I've talked about building models and being *careful*
with the variables. My stats-FAQ has a bunch of old
posts on the hazards of stepwise selection as practiced
in social sciences. Bob has other modeling in mind.
> It's
> theoretically and practically UNTANABLE in the latter case; and
UNTANABLE ? unattainable?
> has to be argued from the partial correlation point of view in the
> "careful" part -- otherwise, don't "expect" any sign -- let the
> DATA dictate what sign it has, and THEN you can examine carefully
> (again from a partial correlation point of view whether the
> sign, different from that of a simple correlation, is reasonable
> or not.
Huh? That sounds redundant ... checking on the validity of
the program? In model building, I advise checking for
what-is-suppressing-what, and -- when using rating scales
with mediocre scaling -- checking for scaling problems, etc.
See what the logic is, based on what the (few) variables *mean*
>
[... ]
>
> Study my posts on PARTIAL CORRELATOINS and how they relate to the signs
> of multiple regression coefficients. Go through this mental exercise:
>
> Let Y be the GPA of a students at Podunk U at the end of their 2nd
> year.
[snip, nice simple example, with more comments than
the previous time. ]
>
> When you do a multiple regression of the Y (GPA at the end of
> Sophomore year) on these 4 variables, do (can) you expect the
> signs of the regression coefficients to be ALL positive?
All positive?
No, not right off. Maybe, not at all. Bob, Do you forget? I
offered the parallel example -- of "good social science" --
where a (national) standardized achievement test had used
"speed" as an intentional suppressor for other Verbal scoring.
That was done as a rational choice, and *not* as a blind
acceptance of the equation-on-hand.
If you check the scaling and what-is-suppressing/ suppressed
for variables, looking especially among outliers, you *may* be
able to rescale the scores to create a more robust predictor.
(Historically, inexplicable suppressors replicate poorly.)
IF scaling is the problem. When you are building a *meaningful*
model, you have to be satisfied with the meaning, or else express
your puzzlement. When you are building a blind predicting
model, you can use your internal cross validation when you
have sufficient N, and keep a closer eye out for regime-change
that will invalidate the model in future extrapolations.
>
> Can you expect the sign <as the banker did> on ANY of these four
> variables in the fitted multiple regression equation? If so,
> tell us why.
> [ ... ]
Four years before I entered grad school, I was explaining
that basic crap to others. I did have to read up on partial
correlations at the time. Explaining it to someone else
so *they* could understand it ... that was good practice.
Do you want to tell us why you attribute a simplistic
strawman to me? Read badly?
Bob seems comfortable with models that have large
numbers of variables, so having enough knowledge is
tough or impossible; but a large N makes up for
inefficiency, for short-term answers.
Before, Bob had seemed intolerant of rational models with
a purpose of being meaningful. Above, he finally offered
(it seems) that one might know enough about the variables.
Probably not.
> Can you expect the sign <as the banker did> on ANY of these four
> variables in the fitted multiple regression equation? If so,
> tell us why.
>
From what you've given, no. However, if I'd already gone through this
exercise with data from a few dozen "similar" schools and *always*
observed the same pattern, I might wonder what was going on if the
pattern was violated, especially if it was the sign of a variable that
always showed a strong partial correlation.
[If it were a friendly wager, like matching pennies--that is, betting on
random phenomena to pass the time--I might go with a positive
coefficient for SAT verbal. Reasoning: In the one dataset like this
that I've seen, the ability to read dominated everything else. Since
the students come from different HSs, HS GPA is probably a poor
predictor (I suspect HS itself, regardless of grade, is a better
predictor!) So, in a friendly wager, I'd go with a positive multiple
regression coefficient for SAT Verbal on the theory that there's nothing
that will swamp it or is sufficiently correlated with it to turn into a
surrogate. With my luck, though, everyone will have scores between 500
and 600 and attenuation will make its sign as volatile as the others.]
--Jerry
And that's when I saw the red-flag waving.
>
>
> >
> >>I don't see that the story is inconsistent with what you're saying. It
> >>wasn't some yahoo making throwaway comments. It was a banker, with the
> >>good sense to consult with Roy, looking at financial data. That is, it
> >>was someone who was likely to be capable of appreciating "the form and
> >>substance of the other variables".
> >
> >
> > Yahoo or banker, had it NOT been the statement of the banker about the
> > attributed by YOU, there wouldn't have been any problem -- as soon
> > as the sole given reason was the SIGN, as stated by you, it's beyond
> > a shadow of a doubt that the "expected sign fallacy" had been
> > committed.
> >
>
> I couldn't see into Roy Welch's or the banker's head, but I would
> understand the comment to mean "given their informed
> judgment/expectation in light of the other variables in the model".
It's certainly POSSIBLE, but highly improbable. Even in the
possible case, no one could validly make the statement he did that
the "unexpected sign" could not possibly be right. It COULD be
right. Always possible, in the highly complex world of partial
correlational information.
>
> While I wouldn't bet the ranch on the sign of a regression coefficient,
> the more experience was gained with a particular type of situation, the
> more I would "expect" some things to hold. For example, in any adult
> population, I would expect the correlation between height and weight to
> be positive, both with an without adjusting for an indicator variable
> denoting sex. If I found a sample for which this wasn't the case, it
> would definitely give me pause. As other variables get added to the
> model, my sense about what happens to the signs of various (partial)
> correlations becomes less certain, but if I were to fit the same model
> in a wide variety of situations, I might come to expect certain things.
All of this is certainly true and possible. But reality says
otherwise. You let the DATA show you WHERE to explore -- instead
of always confirming your "expectation" even when it turned out
to be WRONG. That's what the abusers do. They see a "wrong"
sign (to their fallacious expectation based on simple
correlation ideas), when the sign is actually "right". They
FORCE the sign to be opposite (by hook or by crook, such as
Ridge regression and other voodoo). They end up with the
"correct" sign (in their minds) but the incorrect/wrong model.
They would have been infinitely better off NOT to expect any
sign in the first place. No ifs or buts about it.
>
> I think this is consistent with what Mosteller, Tukey, Box, and you are
> saying about model building and how it is important to understand how
> things change depending on the cofactor set (if that's the right term).
What's "this"? I know their ways of model building VERY WELL. I
practice pretty much the same thing. They wouldn't be caught dead
(too late to make this statement about Tukey now) in any data
analysis with the banker type of "expect sign". Neither would I.
>
> It's the scientific method at work. We hypothesize and then design
> experiments to test the hypotheses. There's nothing in the method that
> forbids the use of observational data as long as the experiments are
> properly designed, analyzed, and *interpreted*. It's the interpretation
> where things usually fall apart, but that's the fault of the
> interpreters for not realizing what their data allow them to say. It's
> not the method itself that's at fault.
All of this PRECLUDES the "expected sign" fallacy, when it is
theoretically and practically unjustified and UNNECESSARY.
< snip >
>
> I *think* what we're discussing here is intelligent model building as
> opposed to throwing everything into a multiple regression package to see
> what pops out. I'm talking about the former.
I certainly hope so. But you certainly haven't had time to go through
the intellectual exercise of the specific GPA data example, nor time
to absorb all the partial correlation introcacies in my post, let alone
putting those TOGETHER and think hard about them.
You're merely rehashing standard data anaysis techniques no one
is refuting.
The latter of throwing everything including the kitchen sink into
a regression problem to grind out a good fit is the STEAMROLLER
abuse. That's a strawman and a different form of abuse.
The "expected sign" abusers are producing far more WRONG results
than the Steamrollers do -- which of course is my subjective
opinion, having read many papers by sign abusers who didn't even
KNOW what the other variables are, in the publications of their
theoretical or practical results. If I get $10 for each such
paper I find in the published journal articles, I would be VERY
RICH quickly by going to journals like "Journal of Political
Economy", and various journals of economics, and social sciences.
> The latter is worse than
> useless because others will act on it if it suits their agenda. Better,
> in that case, that it not be done.
Jerry, do us a favor and comment on the GPA example and the questions
I posed you. Take your time. Tell us HOW you think you, or anyone
else can argue or presuppose what each or ANY of the signs should be.
Furthermore, WHY would anyone expect ANY sign to be positive or
negative? Other than of course the ability to explain to the
unwashed why increasing an SAT score will lower the predicted
GPA in some constently stable models! THAT's the real reason
why folks practice the "expected sign" abuse because they don't
KNOW the partial correlation concept; or they can't explain
concept (when the sign "look wrong" even to supposedly educated
statisticians. One certainly can't expect the layman to
understand the "wrong sign" that is actually right!
BUt for PREDICTION purposes, there's nothing to explain. The
models are NOT causal. It's hidden. Then, the bit about
explaining the sign or expecting any sign becomes a frivolous
exercise in futility that leads only to absue, never
enlightment!
YMMV on some OTHER kinds of model building.
-- Bob.
There are only FOUR independeant variables. There are no steamroller
to roll nor anything an intelligent data analyst could not do WELL,
except NO DATS ANALYST worth his salt would make a banker-like
claim about what SIGN to expect on that simple problem.
-- Bob.
> Godel
> pulled the rug under the mathematicians about the impossibility of
> devising a system of mathematics that is completely logically
> consistent within itself.
He did nothing of the kind.
> Russell and Whitehead threw up their
> hands after toiling for 20 years on the Ptincipia Mathematica;
PM was published in 1913. What do you take to be the relevance of
Godel's theorem to PM?
> and Herman and Jean Rubin wrote a little book about the equivalence
> of dozens of major mathematical theorems throughout the centuries
> as a result of Godel's rug pulling. :-)
The book has nothing whatever to do with Godel's theorem.
> Jerry, do us a favor and comment on the GPA example and the questions
> I posed you. Take your time. Tell us HOW you think you, or anyone
> else can argue or presuppose what each or ANY of the signs should be.
We had crossing posts.
I made two posts in response to your one because I could respond to some
issues faster than to others. Both of my posts are part of this thread.
I didn't change the subject header.
You've already commented on my first post. My second post answered your
question. I suspect you did what I did, namely, go to bed. If that's
the case, then ignore this post. If your newsreader didn't receive my
response to your GPA question, let me know and I'll repost it.
Bob, on the face of it, you seem to be removing the "deduction" arrows
from Figure A(1) in the Box paper you referred to earlier (Science and
Statistics, JASA 1976, Vol 71, 791-799). Can you clarify?
For the benefit of those who have not seen the paper, Figure A(1) shows
"an iteration between theory and practice", with "induction" arrows
pointing from PRACTICE/DATA/FACTS to
HYPOTHESES/MODEL/CONJECTURE/THEORY/IDEA; and "deduction" arrows pointing
in the opposite direction.
>Jerry Dallal wrote:
>> Reef Fish wrote:
>> > Jerry Dallal wrote:
.......................
>BUt for PREDICTION purposes, there's nothing to explain. The
>models are NOT causal. It's hidden. Then, the bit about
>explaining the sign or expecting any sign becomes a frivolous
>exercise in futility that leads only to absue, never
>enlightment!
One can ask, what is one predicting, and what good are
the predictors?
>YMMV on some OTHER kinds of model building.
>-- Bob.
>There are only FOUR independeant variables. There are no steamroller
>to roll nor anything an intelligent data analyst could not do WELL,
>except NO DATS ANALYST worth his salt would make a banker-like
>claim about what SIGN to expect on that simple problem.
>-- Bob.
As far as I can see, there are no "independent" variables.
Not one of the variables listed can be considered in any
way to be causal, with the possible exception of the GRE,
if it was like the old GRE.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
Herman Rubin, Department of Statistics, Purdue University
hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
>> Godel
>> pulled the rug under the mathematicians about the impossibility of
>> devising a system of mathematics that is completely logically
>> consistent within itself.
> He did nothing of the kind.
What he proved is that one could not have a system of
propositions large enough to do "elementary number theory"
which can be proved logically consistent within the
system itself, unless it is inconsistent, in which case,
anything can be proved.
>> Russell and Whitehead threw up their
>> hands after toiling for 20 years on the Ptincipia Mathematica;
> PM was published in 1913. What do you take to be the relevance of
>Godel's theorem to PM?
>> and Herman and Jean Rubin wrote a little book about the equivalence
>> of dozens of major mathematical theorems throughout the centuries
>> as a result of Godel's rug pulling. :-)
> The book has nothing whatever to do with Godel's theorem.
I agree. The book assumes that the usual axioms for
mathematics are consistent, and proceeds from there. As
far as the age of the theorems, I am unsure if any of them
precedes the 20th century; certainly not many.
Herman Rubin wrote:
> In article <1118810568.0...@z14g2000cwz.googlegroups.com>,
> Reef Fish <Large_Nass...@Yahoo.com> wrote:
>
>
> >There are only FOUR independeant variables. There are no steamroller
> >to roll nor anything an intelligent data analyst could not do WELL,
> >except NO DATA ANALYST worth his salt would make a banker-like
> >claim about what SIGN to expect on that simple problem.
>
> >-- Bob.
>
> As far as I can see, there are no "independent" variables.
> Not one of the variables listed can be considered in any
> way to be causal, with the possible exception of the GRE,
> if it was like the old GRE.
> Herman Rubin, Department of Statistics, Purdue University
> hru...@stat.purdue.edu Phone: (765)494-6054 FAX: (765)494-0558
Herman,
When you argued that the functional analysis kind of Measure Theory
is necessary for an applied statistician to apply statistics well,
your Mathematistry slip was only showing down to your ankle.
You comment above removed your slip, shirt, and all guises you put
on to make yourself look like a statistician who knows something
about APPLIED statistics or APPLYING statistics in this regression
analysis context!
-- Bob.
Herman Rubin wrote:
> In article <vcbpsun...@beta19.sm.ltu.se>,
> Torkel Franzen <tor...@sm.luth.se> wrote:
> >"Reef Fish" <Large_Nass...@Yahoo.com> writes:
>
> >> Godel
> >> pulled the rug under the mathematicians about the impossibility of
> >> devising a system of mathematics that is completely logically
> >> consistent within itself.
>
> > He did nothing of the kind.
Oh, but he did EXACTLY that, as I shall show.
>
> What he proved is that one could not have a system of
> propositions large enough to do "elementary number theory"
> which can be proved logically consistent within the
> system itself, unless it is inconsistent, in which case,
> anything can be proved.
Here I am, an APPLIED statistician who had long abandoned the
boring disciplines of mathematics and mathematical statistics,
correcting two MATHEMATICIANS of their errors in their own
subject of MATHEMATICS.
Tortel was totally wrong, and Herman gets a little partial credit.
> >> Russell and Whitehead threw up their
> >> hands after toiling for 20 years on the Ptincipia Mathematica;
>
> > PM was published in 1913. What do you take to be the relevance of
> >Godel's theorem to PM?
The first volume of Principia Mathematica was published in 1910, not
1913. While the three volumes were published in 1910, 1912, and
1913, Russell and Whitehead were not immune to Godel's demonstration
that no complex mathematical system was complete! In other words,
no mater WHAT axioms are chosen, meaningful mathematical statements
can be made whose truth or falsehood cannot be demonstrated within
the system.
This is certainly contrary to Herman's understanding that it was
limited to "elementary number theory". It encompassed the entire
FIELD of mathematics!
> >> and Herman and Jean Rubin wrote a little book about the equivalence
> >> of dozens of major mathematical theorems throughout the centuries
> >> as a result of Godel's rug pulling. :-)
>
> > The book has nothing whatever to do with Godel's theorem.
Unbeknownst to them, perhaps. Perhaps I should have stated that
Godel pulled the rug out of ALL of the equivalent theorems in the
book by the Rubins.
> I agree. The book assumes that the usual axioms for
> mathematics are consistent, and proceeds from there. As
> far as the age of the theorems, I am unsure if any of them
> precedes the 20th century; certainly not many.
It seems you're not even aware of Godel's SECOND discovery that was
more devastating than the first. Godel demonstrated that it was
IMPOSSIBLE to prove that any complex mathematical system was
consistent!
In the words of Paul Hoffman, the publisher of Encyclopedia
Britannica who wrote a biographical book about Paul Erdos,
"On the Richter scale of mathematical discoveries, Godel's
was a 10."
Perhaps the 10 on the Richter scale was exaggerated, because the
shock hadn't reached Tortel or Herman almost a century later!
But Hoffman was right on the mark when he said, "In the wake of
Godel, most card-carrying mathematicians still clung to the
belief that mathematics was in fact free of contradictions,
though they can never proof it."
Now I can come back to Tortel's unfounded retort:
> > PM was published in 1913. What do you take to be the
> > relevance of Godel's theorem to PM?
I am now quoting Hoffman quoting Russell (co-author of PM), who
according to Hoffman, "was crushed" (years LATER):
"... I was continually reminded of the fable about the
elephant and the tortoise. Having constructed an
elephant on which the mathematical world could rest,
I found the elephant to be tottering, and proceeded
to construct a tortoise to keep the elephant from
falling. But the tortoise was no more secure than the
elephant and after some twenty years of very arduous
toil, I came to the conclusion that there was nothing
more that _I_ could do in the way of making mathematical
knowledge indubitable."
That was how Godel's discoveries were relevant to Russell and
Whitehead's Principia Mathematica -- "pull the rug under the
mathematicians" <that was MY way of characterizing that part
of the mathematical history.
Gentlemen, it gives me pleasure to contribute some MATHEMATICAL
FACTS to two mathematicians in a statistics forum, where I
claim NO INTEREST whatsoever in Herman Rubin's type of IRRELEVANT
mathematics toward the APPLICATIONS of Statistics.
-- Bob.
>Herman Rubin wrote:
>> In article <1118810568.0...@z14g2000cwz.googlegroups.com>,
>> Reef Fish <Large_Nass...@Yahoo.com> wrote:
>> >There are only FOUR independeant variables. There are no steamroller
>> >to roll nor anything an intelligent data analyst could not do WELL,
>> >except NO DATA ANALYST worth his salt would make a banker-like
>> >claim about what SIGN to expect on that simple problem.
>> >-- Bob.
>> As far as I can see, there are no "independent" variables.
>> Not one of the variables listed can be considered in any
>> way to be causal, with the possible exception of the GRE,
>> if it was like the old GRE.
>Herman,
>When you argued that the functional analysis kind of Measure Theory
>is necessary for an applied statistician to apply statistics well,
>your Mathematistry slip was only showing down to your ankle.
I have never argued for functional analysis, or being able
to prove the theorems of measure theory; this is NOT
enough. But being able to understand them is needed.
>You comment above removed your slip, shirt, and all guises you put
>on to make yourself look like a statistician who knows something
>about APPLIED statistics or APPLYING statistics in this regression
>analysis context!
Is this a place where a straight regression is worth
anything at all? I am quite aware of the misapplication of
statistics in psychology, and in particular in education,
and this is a good example.
I am quite aware that university administrators take such
regressions as gospel. Regression is overused, because
its value is mistaught.
>-- Bob.
--
This address is for information only. I do not claim that these views
are those of the Statistics Department or of Purdue University.
This is well understood.
>This is certainly contrary to Herman's understanding that it was
>limited to "elementary number theory". It encompassed the entire
>FIELD of mathematics!
I said that if the system of propositions is LARGE ENOUGH
to do elementary number theory, this is the case. Tarski
proved that elementary algebra and geometry are decidable
and consistent; the reason is by elementary methods, one
cannot define what is an integer. This does not mean that
one cannot decide, by elementary methods, whether a particular
real number is an integer, but one cannot come up with an
elementary procedure for all integers.
>> >> and Herman and Jean Rubin wrote a little book about the equivalence
>> >> of dozens of major mathematical theorems throughout the centuries
>> >> as a result of Godel's rug pulling. :-)
>> > The book has nothing whatever to do with Godel's theorem.
>Unbeknownst to them, perhaps. Perhaps I should have stated that
>Godel pulled the rug out of ALL of the equivalent theorems in the
>book by the Rubins.
We were quite aware that IF mathematics was inconsistent,
the "rug" was pulled out.
>> I agree. The book assumes that the usual axioms for
>> mathematics are consistent, and proceeds from there. As
>> far as the age of the theorems, I am unsure if any of them
>> precedes the 20th century; certainly not many.
>It seems you're not even aware of Godel's SECOND discovery that was
>more devastating than the first. Godel demonstrated that it was
>IMPOSSIBLE to prove that any complex mathematical system was
>consistent!
We were quite aware of this. Also, it is that it cannot
be proved WITHIN the system. There is lots of mathematics
being done now which can be proved not to be provably
consistent within set theory; everything we did was
provably consistent within set theory. Godel himself
proved that the axiom of choice and the generalized
continuum hypothesis were consistent within set theory.
>In the words of Paul Hoffman, the publisher of Encyclopedia
>Britannica who wrote a biographical book about Paul Erdos,
>"On the Richter scale of mathematical discoveries, Godel's
>was a 10."
>Perhaps the 10 on the Richter scale was exaggerated, because the
>shock hadn't reached Tortel or Herman almost a century later!
>But Hoffman was right on the mark when he said, "In the wake of
>Godel, most card-carrying mathematicians still clung to the
>belief that mathematics was in fact free of contradictions,
>though they can never proof it."
If mathematics has contradictions, they will eventually
show up, and then we will revise the axioms so that those
particular contradictions are taken care of. Why do YOU
think "applied statistics" has any basis? Its entire
basis is mathematics.
>Now I can come back to Tortel's unfounded retort:
>> > PM was published in 1913. What do you take to be the
>> > relevance of Godel's theorem to PM?
>I am now quoting Hoffman quoting Russell (co-author of PM), who
>according to Hoffman, "was crushed" (years LATER):
> "... I was continually reminded of the fable about the
> elephant and the tortoise. Having constructed an
> elephant on which the mathematical world could rest,
> I found the elephant to be tottering, and proceeded
> to construct a tortoise to keep the elephant from
> falling. But the tortoise was no more secure than the
> elephant and after some twenty years of very arduous
> toil, I came to the conclusion that there was nothing
> more that _I_ could do in the way of making mathematical
> knowledge indubitable."
>That was how Godel's discoveries were relevant to Russell and
>Whitehead's Principia Mathematica -- "pull the rug under the
>mathematicians" <that was MY way of characterizing that part
>of the mathematical history.
>Gentlemen, it gives me pleasure to contribute some MATHEMATICAL
>FACTS to two mathematicians in a statistics forum, where I
>claim NO INTEREST whatsoever in Herman Rubin's type of IRRELEVANT
>mathematics toward the APPLICATIONS of Statistics.
Until Godel produced his results, there was a major effort
to prove mathematics consistent. Is Hoffman that great a
mathematician?
In the words of Andre Weil, who is one of the great ones,
God exists, because mathematics is consistent.
The devil exists, because we cannot prove it.
Mathematicians are quite aware of the problem. You seem to
have great faith in the application of restricted parts,
by those who are unaware of the extent of their ignorance.
>-- Bob.
Herman Rubin wrote:
> In article <1119017328.1...@g44g2000cwa.googlegroups.com>,
> Reef Fish <Large_Nass...@Yahoo.com> wrote:
>
>
> >Herman Rubin wrote:
> >> In article <1118810568.0...@z14g2000cwz.googlegroups.com>,
> >> Reef Fish <Large_Nass...@Yahoo.com> wrote:
>
>
> >> >There are only FOUR independeant variables. There are no steamroller
> >> >to roll nor anything an intelligent data analyst could not do WELL,
> >> >except NO DATA ANALYST worth his salt would make a banker-like
> >> >claim about what SIGN to expect on that simple problem.
>
> >> >-- Bob.
>
> >> As far as I can see, there are no "independent" variables.
> >> Not one of the variables listed can be considered in any
> >> way to be causal, with the possible exception of the GRE,
> >> if it was like the old GRE.
>
>
> >Herman,
>
> >When you argued that the functional analysis kind of Measure Theory
> >is necessary for an applied statistician to apply statistics well,
> >your Mathematistry slip was only showing down to your ankle.
>
> I have never argued for functional analysis, or being able
> to prove the theorems of measure theory; this is NOT
> enough. But being able to understand them is needed.
You implied it's neccessary to understand the Measure Theory USED
in functional analysis before one is equipped to do good applied
statistics -- which is a totally fallacious claim.
>
> >You comment above removed your slip, shirt, and all guises you put
> >on to make yourself look like a statistician who knows something
> >about APPLIED statistics or APPLYING statistics in this regression
> >analysis context!
>
> Is this a place where a straight regression is worth
> anything at all?
You bet! It's far better than using ONLY the usual information from
the students' application forms, including the predictor variables.
For one thing, it allows the university to have DIFFERENT models
(for predicting the success of admitted students) for different
colleges or majors. For example, students applying to the Engineering
College has a different prediction model from students applying to
the School of Liberal Arts (majoring in subjects that require little
or no quantitative aptitude or skills).
> I am quite aware of the misapplication of
> statistics in psychology, and in particular in education,
> and this is a good example.
>
> I am quite aware that university administrators take such
> regressions as gospel. Regression is overused, because
> its value is mistaught.
Nobody takes such results are gospel, no more so than taking the
results of SAT or GRE (which YOU seem to favor in your own biased
way).
It's REALITY, Herman. It's Reality in the World of Applied
Statistics, used PROPERLY, in prediction. It's used by many
schools, to yield BETTER objective measures of their applicants
than WITHOUT the prediction models.
In the end, non-quantifiable traits DO ocunt, in letters of
recommendation and interests and successes in extracurrirular
activities, but none of that lessens the value of using
STATISTICAL modeling to help the selection of applicants.
-- Bob.
But that adds nothing to my asserts of mathematical FACTS that
totally debunked Tortel's rebuttal of my post, and much of what
you said supporting his wrong understanding of the history about
which I posted.
Glad you asked! In Hoffman's words,
"I am not a mathematician. I have never proved a theorem, let
alone offered a surprising conjecture. My Erdos number is infinity."
Yet through his thorough research, he wrote a remarkable book about
Paul Erdos, encompassing the entire HISTORY of mathematics, with
impeecable precision and comprehension of the major mathematical
results. He is BETTER than most 10 mathematicians I've known.
You should get a copy of his 1998 book, "The Man who Loved Only
Numbers" and LEARN some mathematics and mathematical history from
it, besides the life of Paul Erdos.
Paul Hoffman even knew, the history behind Weil's proof of Fermat's
Last Theorem, one that stood for 550 years without a proof, and
the VITAL role of the Taniyama-Shimura's Conjecture (which stood
unproven for decades until it was proved by Ken Ribet in 1986).
> In the words of Andre Weil, who is one of the great ones,
>
> God exists, because mathematics is consistent.
>
> The devil exists, because we cannot prove it.
That's the Andre Weil, not to be confused with the Andrew Wiles
who proved Fermat's Last Theorem.
The quote was in Hoffman's book, where your "because" was quoted
as "since".
But the bottom line of this side-thread on Godel is that Tortel
declared this about my assertions about Goedel and its influence
on Russell and Whitehead's Principia Mathematica, and Herman
Rubin agreed with Tortel, for the most part -- and Tortel was
totally wrong.
Tortel> >> > He did nothing of the kind.
-- Bob.
Sorry it took so long for me to get back to this request for
clarification. It's always good when someone asks for clarification
rather than start arguing on a wrong premise as we've seen so often
in this ng.
I didn't remove the "deduction" arrows from Figure A1) in the Box
paper.
I am clarifying it FOR Box here. :-) The top part of the figure
you referred to was a POOR (and misleading <since it mislead you>)
diagram trying to convey the ITERATIVE process of "critic" and
"sponsor" by the same model-builder.
The bottom diagram labeled as Figure A(2) was describing the same
process as "A Feedback Loop".
The Inductive part finds places where the model need to be modified
before any inference or hypotheses can be validly tested (e.g., if
the residuals don't behave as postulated, either because they
violate the iid Normal assumption or because the FUNCTION model is
wrong (fitted a straight line to curve or a hyperplane to a nonlinear
surface), no statistical DEDUCTION based on the error model could
be valid.
The DEDUCTIVE part (interval estimation or hypothesis tests of the
unknown parameters in the original or modified model, prediction
intervals, etc.) after the residuals are KNOWN to behave as they
should, enables the model builder to assess OTHER aspects of the
tentative model -- and the one I always tell my student to look
examine carefully was the "practical significance" rather than
the "statistical significance" of the result! The DEDUCTIVE
inference often lead to further modification of the tentative
model under the iterative process.
But one thing I am SURE Box did NOT mean by his "deductive"
inference on hypotheses was to use regression results to EXPLAIN
a phenomenon or ascertain CAUSAL effects as malpractitioners of
regression methods often do, from observational data and the
FITTING (modeling) thereof, WITHOUT the necessary prerequisite
of appropriately designed experiments and controls.
I hope that clarified the inductive/deductive parts of Box's
figure A for you.
For another occasion, I can post an article from my Lecture Note
Chapter on "Statistical significance" vs "Practical Significance"
in which I took a data set from the SPSS Manual in the early 1970s
(p. 359 of the 1975 SPSS Manual, to be precise) in which the multiple
regression results (with three independent variables) were shown
by SPSS to be statistically highly significant in every respect,
but failed to notice that it was completely useless in PRACTICE,
on the basis of prediction intervals.
My lesson was to show that when properly analyzed through the
"model building" process facilitated by the IDA package, a SIMPLE
regression model, using only ONE of the three variables and the
same data given in the manual, did a better job (in every respect)
than the multiple regression model shown in the SPSS Manual.
Sounds astounding? It is. And it was in black and white. I know
some of you are curious about this, but you have to wait for
another time and another place for it.
Any of you model-builders can take this hint and analyze that
dataset yourself and see how you are. I gave the exact citation
in the 1975 Manual.
-- Bob.
Here, I meant to say Wiles's (Andrew Wiles) and not "Weil" (Andre
Weil) -- an amazing coincidence of two men -- the former, a great
mathematician, and the latter a non-mathematician, should be
featured in Hoffman's book about Erdos, in the role Godel played
in mathematics!
Hoffman said of Godel, "Genious though he was, Godel was not
exactly a poster boy for mathematical sanity."
That remark was made with respect to the running theme in the book
which culminated in the final chapter of the book, "We mathematicians
are all a little crazy" in which Hoffman immediately disclaimed himself
to be a non-mathematician who had never proved a theorem. :-)
MOST of the mathematicians featured in the book were more than just
"a little crazy". David Hilbert and John Nash are two well-known
mathematicians who are MORE than just a little crazy.
Andre Weil was a "number theorist extraordinaire", and Ron Graham's
thesis adviser -- and Ron (who headed Bell Lab's Think Tank of
90-odd mathematicians and statisticians for some years) was Paul
Erdos's banker, accountant, baby sitter, etc. because Paul's entire
life posession was contained in one brief case, and his mathematical
papers in the other, and Paul Erdos never learned how to tie his
own shoes or turn off the water in a bath even at the age of 70. :-)
Paul Erdos was more odd and eccentric than he was crazy though.
But Andrew Wiles was by far the "dragon slayer" in mathematics
when he slayed the 550 year-old dragon of "Fermat's Last Theorem."
This is my CORRECTION NOTE, about misattributing Andrew Wiles'
work to Andre Weil, although later in my post, I correctly
identified and distinguished them,
RF> That's the Andre Weil, not to be confused with the Andrew Wiles
RF> who proved Fermat's Last Theorem.
-- Bob.
> Unbeknownst to them, perhaps. Perhaps I should have stated that
> Godel pulled the rug out of ALL of the equivalent theorems in the
> book by the Rubins.
What, if anything, do you mean by the metaphor "pulled the rug out
of"?
> It seems you're not even aware of Godel's SECOND discovery that was
> more devastating than the first. Godel demonstrated that it was
> IMPOSSIBLE to prove that any complex mathematical system was
> consistent!
He did nothing of the kind.
> That was how Godel's discoveries were relevant to Russell and
> Whitehead's Principia Mathematica -- "pull the rug under the
> mathematicians" <that was MY way of characterizing that part
> of the mathematical history.
What, if anything, do you take the quoted comment by Russell to have
to do with Gödel's theorem?
[various stuff about Gödel's theorem]
I wrote a short response, which I cancelled, because it was really
pretty pointless and grouchy. So in the perhaps unlikely event that
you are at all interested in having your various misconceptions
corrected, let me just point you to my recent book, which explains
in tedious detail where you go wrong:
That Russell was devastated by Godel's discovery that "completeness"
and "consistency" can never be proven within an axiomatic system,
and Russell and Whitehead were trying to develope in their
Prinicipia Mathematica and years of work there after until
Godel came along?
I thought that was rather plain English by Russell, and that was
exactly the way Hoffman meant for it to mean!
Perhaps an additional quote from the book will make it clearer for YOU?
I am quoting Hoffman now, regarding Godel's discovery that no
axiomatic system of mathematics can ever be proved to be consistent.
"Erdos was having too good a time solving problems to worry about the
philosophical underpinning of the enterprise.
Russell, on the other hand, was crushed.
< now Hoffman was quoting Russell >
I wanted certainty in the kind of way in which people want
religious faith. I that that certainty was more likely to
be found in mathematics than elsewhere. < ... >
After some twenty years of very arduous toil, I came to the
conclusion that there was nothing more that I could do in
the way of making mathematical knowledge more dubitable.
Which part of it did you not understand?
This was more DIRECT in an earlier part of the book:
"Russell and Alfred North Whitehead responded to Hilbert's call.
Like Frege before them, they tried to build up all of mathematics
from first principles in three impenetrable volumes of Principia
Mathematica. The first volume was published in 1910. The project
went along swimmingly for two decades, until young Godel derailed
it.
The quote of Russell was to show how "crushed" Russel was by this
derailment.
That's as much as I am going to explain to you or anyone else.
If you still don't see why you were TOTALLY wrong about this,
there is no more I can do for YOU, as there was nothing more
Russell could do "to make mathematics more dubitable."
-- Bob.
> That Russell was devastated by Godel's discovery that "completeness"
> and "consistency" can never be proven within an axiomatic system,
Actually, Russell's comment was in no way prompted by or connected
with Gödel's theorem. These historical matters are not treated in
the book I referred to, so I suggest you look at the archives of
the Russell-L mailing list, where the topic has been discussed from
time to time.
>> >> >-- Bob.
>> >Herman,
The regression cited will have the typical flaws. It will
NOT detect good students. If fact, quite a few high schools
in Indiana will no longer give out class ranks or GPAs, because
of the way they will necessarily be misused, if they are used
at all; these include most of the good schools.
Good students try to LEARN, not to get high grades. Try
to find these!
>For one thing, it allows the university to have DIFFERENT models
>(for predicting the success of admitted students) for different
>colleges or majors. For example, students applying to the Engineering
>College has a different prediction model from students applying to
>the School of Liberal Arts (majoring in subjects that require little
>or no quantitative aptitude or skills).
The predictors cited, and what is predicted, do nothing of
the sort. Why should engineering or science schools care
about average grades in a year where half in in humanities
courses? And the School of Liberal Arts is now finding
that there is more need for mathematical concepts, NOT
the irrelevant skills tested in the high schools.
>> I am quite aware of the misapplication of
>> statistics in psychology, and in particular in education,
>> and this is a good example.
>> I am quite aware that university administrators take such
>> regressions as gospel. Regression is overused, because
>> its value is mistaught.
>Nobody takes such results are gospel, no more so than taking the
>results of SAT or GRE (which YOU seem to favor in your own biased
>way).
You are unaware of the viewpoint of those on admissions
committees.
>It's REALITY, Herman. It's Reality in the World of Applied
>Statistics, used PROPERLY, in prediction. It's used by many
>schools, to yield BETTER objective measures of their applicants
>than WITHOUT the prediction models.
Considering what goes on in the high schools, it is needed
to give good long tests. The old SAT gave a reasonable
estimate of intelligence. The high school courses of a
half century ago gave some insight as well. But with the
overemphasis on skills and rote, we can now trust nothing.
I cannot even trust the grades from Purdue as meaning
anything of importance.
BTW, several years ago I saw an ETS study of predictors of
graduate success. The multiple R was just over .5, with
almost all of this coming from how many schools the
applicant applied to.
>In the end, non-quantifiable traits DO ocunt, in letters of
>recommendation and interests and successes in extracurrirular
>activities, but none of that lessens the value of using
>STATISTICAL modeling to help the selection of applicants.
They do? It is hard to evaluate those chicken tracks even
if you know which chicken is making them.
Torkel Franzen wrote:
> "Reef Fish" <Large_Nass...@Yahoo.com> writes:
>
> > That Russell was devastated by Godel's discovery that "completeness"
> > and "consistency" can never be proven within an axiomatic system,
>
> Actually, Russell's comment was in no way prompted by or connected
> with Gödel's theorem. These historical matters are not treated in
> the book I referred to,
Because you book was only about Godel's theorem(s) on "completeness".
And your knowledge about Russel seems limited to what you referenced
in the Russell-L list -- that's really laughable.
< so I suggest you look at the archives of
> the Russell-L mailing list, where the topic has been discussed from
> time to time.
I don't have the Hoffman book with me now. But I recall his quote
about Russell was based on Russell's own book in the 1950s.
Russell was talking about how he felt about Godel's discoveries
and he crushed he was! (Hoffman gave very detailed attributions
<about 10 pages worth> in his book about the SOURCES of many of
his FACTS).
Why should anyone believe Russell HIMSELF or the documented
research in a book by an editor of the Encyclopedia Britannica
over a discussion LIST, Russell-L, with no attribution to any
facts?
-- Bob.
> Russell was talking about how he felt about Godel's discoveries
> and he crushed he was!
Alas, these are mere fantasies.
Torkel Franzen wrote, AFTER my latest post, without quoting a
single word from it, but quoting from an older post:
You should have posted THESE:
RF> And your knowledge about Russel seems limited to what you
referenced
RF> in the Russell-L list -- that's really laughable.
RF> I don't have the Hoffman book with me now. But I recall his quote
RF> about Russell was based on Russell's own book in the 1950s.
RF> Russell was talking about how he felt about Godel's discoveries
RF> and he crushed he was! (Hoffman gave very detailed attributions
RF> <about 10 pages worth> in his book about the SOURCES of many of
RF> his FACTS).
After I get home later today, I'll cite Russell's book cited in
Hoffman's book in which Russell's quote about how he felt about
Godel's work was cited.
Why should anyone believe Russell HIMSELF or the documented
research in a book by an editor of the Encyclopedia Britannica
over a discussion LIST, Russell-L, with no attribution to any
facts?
Fantasies of Russell? Did you EVER read THAT book of his?
I very seriously doubt it. But why should that keep you from
making your various totally false statements about Russell
which were TOTALLY debunked by Hoffman and by Russell HIMSELF?
-- Bob.
> Why should anyone believe Russell HIMSELF or the documented
> research in a book by an editor of the Encyclopedia Britannica
> over a discussion LIST, Russell-L, with no attribution to any
> facts?
You're imagining things. The subject of Gödel and Russell is
well-researched. There are very few comments that Russell made
about Gödel. His views about the failure of Principia to
establish the certainty of mathematics have nothing whatever
to do with Gödel.
<Large snip>
> For another occasion, I can post an article from my Lecture Note
> Chapter on "Statistical significance" vs "Practical Significance" in
> which I took a data set from the SPSS Manual in the early 1970s (p.
> 359 of the 1975 SPSS Manual, to be precise) in which the multiple
> regression results (with three independent variables) were shown by
> SPSS to be statistically highly significant in every respect, but
> failed to notice that it was completely useless in PRACTICE, on the
> basis of prediction intervals.
> My lesson was to show that when properly analyzed through the "model
> building" process facilitated by the IDA package, a SIMPLE regression
> model, using only ONE of the three variables and the same data given
> in the manual, did a better job (in every respect) than the multiple
> regression model shown in the SPSS Manual.
> Sounds astounding? It is. And it was in black and white. I know
> some of you are curious about this, but you have to wait for another
> time and another place for it.
> Any of you model-builders can take this hint and analyze that dataset
> yourself and see how you are. I gave the exact citation in the 1975
> Manual.
Totally lacking access to anything to with SPSS, or even SAS, I would
be very pleased to be told how I might get hold of that data set.
Sounds like an interesting and instructional exercise to try analysing
it from a standing start.
Robin
<Further snip>
Because YOU said so?
You have produced NO evidence in support of YOUR assertion other
than your unsubstantiated statements.
I am at home now, and can cite the reference Hoffman gave, on
HIS quote of Russell, from Russell's book,
The length quote was taken from p. 53 of
Russell, Bertrand (1956). Portraints from Memory, and Other
Memories. Allen and Unwin.
describing how Russell was crushed by Godel's discoveries.
The OTHER books by Russell cited, for his (Hoffman's) other
comments about Godel and Russell were based on
Russell, Bertrand (1951). The autobiography of Bertrand
Russell, 1872-1914. Atlantic Monthly Press, 1967.
Russell, Bertrand (1959). My Philosophical Development.
Allen and Unwin.
I found Paul Hoffman's research and meticulously kept hand-written
notes attributed in his book, to be factually accurate AFAIK, and
completely credible.
Have you read any of those books OR the passage cited by Hoffman?
Besides having to take up the matter of your disagreement with
Hoffman, you have not shown a SINGLE attribution to support YOUR
irresponsible statements about Russell, in the light of the
supporting material given by Hoffman.
Why don't you give some EVIDENCE
Tortel> There are very few comments that Russell made about Gödel.
and that those few comments led you to challege Paul Hoffman's
characterization and specific quotes by Russell as Russell's
comments about Godel and Godel's influence on the Russell-Whitehead
volumes of Principia Methematica?
You don't even know how to attribute MY POSTS and what I said in
them. Let's see some citation from you about Russell, FROM
Russell to support YOUR imagination.
-- Bob.
I know there are quite a few SPSS users in these newsgroups. Some of
them may even be employees of SPSS, Inc. If any of them, via library
or their own copies of SPSS Manuals can scan/transfer the DATA on
page 359 of the SPSS Manual of 1975, and post it, I think it should
be of interest to many readers, besides yourself.
It's the ANNUAL data from 1935 to 1966 on Investors Index (1949 = 100),
the dependent variable; and GNP, Corporate Profits before taxes,
and Corporate Dividends paid as the three independent variables in
the multiple regression.
If no one has access to them, I'll manually type and post the
data in a future post.
-- Bob.
> describing how Russell was crushed by Godel's discoveries.
This is just your little fantasy, for which you shouldn't blame
Hoffman.
If you call the facts "fantasy", then you should have said that's
"Hoffman's little fantasy".
You have produced NO facts, NO attribution, NO rebuttal of facts,
and have posted absolutely NO substance on the subject other than
your ad hominem statements.
I am through with you, Torkel Fransen!
You have proved yourself over and over again in this subthread
about Russell and Godel that you knew ABSOLUTELY NOTHING about
what Hoffman wrote, or what Russell wrote (in his three books)
because your total knowledge was what little you read in the
Russell-L list, which obviously did NOT address the relation
between Russell and Godel, as Hoffman researched.
Buzz off with YOUR fantasy.
-- Bob.
> You have proved yourself over and over again in this subthread
> about Russell and Godel that you knew ABSOLUTELY NOTHING about
> what Hoffman wrote, or what Russell wrote (in his three books)
> because your total knowledge was what little you read in the
> Russell-L list, which obviously did NOT address the relation
> between Russell and Godel, as Hoffman researched.
I take it you didn't find any actual statement of Russell's
referring to Gödel. This is not surprising, since there are very
few such. But then, why just repeat the claim that "Russell was
crushed by Godel's discoveries", which is after all pure invention?
The relevance of the incompleteness theorem to the logicist
program is moot, but Russell's own conclusion that Principia failed
to establish the certainty of mathematics had nothing to do with
either completeness or consistency. Rather, he observed that some of
the axioms used in the system of PM could by no means be called
logically true. In particular, the axiom of infinity was treated in PM
as purely hypothetical, and Russell's view of the axiom of
reducibility was that it had no justification whatsoever except the
pragmatic one that it seemed to be needed.
It was entirely reasonable for Russell not to attach any great
significance to the incompleteness theorem as far as his own work
was concerned, but it is quite possible that this was partly based
on his never fully understanding the theorem. Thus in the 1944
Schilpp volume, he says (in the "Addendum to Reply to Critics")
"He [Godel] proved that, in any systematic logical language, there are
propositions which can be stated , but cannot be either proved or
disproved. ...I had always supposed there are propositions in
mathematical logic which can be stated , but neither proved or
disproved. Two of these had a fairly prominent place in _Principia
Mathematica_--- namely , the axiom of choice and the axiom of
infinity."
As will be seen, this strongly suggests some misunderstanding on
Russell's part. However, in view of how little he said about the
incompleteness theorem, it would be rash to ascribe any definite views
to Russell.
Torkel Franzen wrote:
> "Reef Fish" <Large_Nass...@Yahoo.com> writes:
>
> > You have proved yourself over and over again in this subthread
> > about Russell and Godel that you knew ABSOLUTELY NOTHING about
> > what Hoffman wrote, or what Russell wrote (in his three books)
> > because your total knowledge was what little you read in the
> > Russell-L list, which obviously did NOT address the relation
> > between Russell and Godel, as Hoffman researched.
>
> I take it you didn't find any actual statement of Russell's
> referring to Gödel. This is not surprising, since there are very
> few such. But then, why just repeat the claim that "Russell was
> crushed by Godel's discoveries", which is after all pure invention?
Call that Hoffman's invention if you wish, but he had supporting
facts and reasons for his "invention".
>
> The relevance of the incompleteness theorem to the logicist
> program is moot, but Russell's own conclusion that Principia failed
> to establish the certainty of mathematics had nothing to do with
> either completeness or consistency.
Now we FINALLY get you try to give the reasons for YOUR fantasy.
Has nothing to do with "consistency"? You're missing even MORE than
I thought.
Even Herman Rubin, who supported your view initially, backed down
from following your blind lead:
RF>It seems you're not even aware of Godel's SECOND discovery that was
RF>more devastating than the first. Godel demonstrated that it was
RF>IMPOSSIBLE to prove that any complex mathematical system was
RF>consistent!
Which what Principia Mathematica was TRYING to do, to construct a
consistent system from first principles.
HR> We were quite aware of this.
What you wrote below contradicted yourself.
> Rather, he observed that some of
> the axioms used in the system of PM could by no means be called
> logically true. In particular, the axiom of infinity was treated in PM
> as purely hypothetical, and Russell's view of the axiom of
> reducibility was that it had no justification whatsoever except the
> pragmatic one that it seemed to be needed.
>
> It was entirely reasonable for Russell not to attach any great
> significance to the incompleteness theorem as far as his own work
> was concerned, but it is quite possible that this was partly based
> on his never fully understanding the theorem. Thus in the 1944
> Schilpp volume, he says (in the "Addendum to Reply to Critics")
>
> "He [Godel] proved that, in any systematic logical language, there are
> propositions which can be stated , but cannot be either proved or
> disproved. ...I had always supposed there are propositions in
> mathematical logic which can be stated , but neither proved or
> disproved. Two of these had a fairly prominent place in _Principia
> Mathematica_--- namely , the axiom of choice and the axiom of
> infinity."
That added nothing to what Hoffman had said in his book. That's why I
had also said Godel had pulled the rug out of the Rubins' book.
Of course there are proofs with the mathematical system, but if the
system ITSELF can never be proved or disproved of certain propositions
(which Russell was trying to overcome in Russell's example of the
paradox of the barber of seville), then the system is not "consistent".
Since Godel had shown that Russell's consistent system is IMPOSSIBLE
to prove -- that there will always be statement like Russell's
paradox that cannot be proved or disproved -- THAT was the reason
Russell was "crushed" according to Hoffman's exposition supported
by his cited references.
>
> As will be seen, this strongly suggests some misunderstanding on
> Russell's part.
Now you have progressed from calling the fact MY "fantasy" to
Pual Hoffman's "fantasy", and finally to "misunderstanding on
Russell's part" -- Russell's fantasy?
> However, in view of how little he said about the
> incompleteness theorem, it would be rash to ascribe any definite views
> to Russell.
I had already said, Godel's incompleteess theorem was ALL you knew.
But the devastating blow to Russell was Godel's discovery of
"INCONSISTENCY" of which you seem to blissfully unaware.
Let me refresh your memory on what I had already posted, in reply
to Herman Rubin:
RF > It seems you're not even aware of Godel's SECOND discovery that
was
RF > more devastating than the first. Godel demonstrated that it was
RF > IMPOSSIBLE to prove that any complex mathematical system was
RF > consistent!
The lack of "consistency" was discovered LONG after Godel's proof
of "incompleteness: in any axiomatic system.
In the words of Paul Hoffman, about the "inconsistency" discovery,
RF> "On the Richter scale of mathematical discoveries, Godel's
RF> was a 10."
RF> But Hoffman was right on the mark when he said, "In the wake of
RF> Godel, most card-carrying mathematicians still clung to the
RF> belief that mathematics was in fact free of contradictions,
RF> though they can never proof it."
You are certainly one of those card-carrying mathematicians.
Your lastest post at least gave reasons why YOU misunderstood Hoffman
as well as misunderstanding Russell, and you even charged that
"strongly suggests some misunderstanding on Russell's part."
Sure, everybody misunderstood except Tortel, even Russell about
his own work! LOL.
The only FACT that can be inferred from your post is that you
never READ any of the three books by Russell which Hoffman
referenced as his SOURCE, and you could not cite ANYTHING from
the books by Russell to contradict what Hoffman carefully
researched and documented.
That should pretty much wrap things up on Tortel on Russell
and Godel.
-- Bob.
> But the devastating blow to Russell was Godel's discovery of
> "INCONSISTENCY" of which you seem to blissfully unaware.
I'm afraid you're quite confused.
You should learn to cite at least the POST to which you are always
quote a single line out of context.
I am afraid you are VERY confused because of your ignorance about
Godel's result on the "inconsistency" of any axiomatic system,
and your seeming attention deficit and lack of comprehension in
reading my post, and deleted it in its entirety. I am restoring
the substance of what I posted, below, for the benefit of other
reasons who might have jumped in at this juncture of the
discussion.
-- Bob.
Torkel Franzen wrote:
invention".
- Hide quoted text -
- Show quoted text -
But the devastating blow to Russell was Godel's discovery of
"INCONSISTENCY" of which you seem to blissfully unaware.
Let me refresh your memory on what I had already posted, in reply
[...]
Both Gödel's theorem and Russell's work in logic are very peripheral
subjects from the point of view of mathematical statistics, and there
is no reason why you should know or wish to learn anything about
them. It is perhaps a bit odd that you have this bee in your bonnet
about insistently putting forward your various misconceptions, but
such things are fairly common on the net.
You have stated several of YOUR misconceptions already!
While it is true that Godel and Russell's works are very peripheral
to my main professional interest, the rest of your statement are
ALL WRONG!
I am NOT interested in Mathematical Statistics. I know MORE about
Mathematics than most mathematicians who have doctorates in
mathematics, because I have taken more mathematics courses than
would have been required to get a Ph.D. in mathematics.
I am a man of many interest, not the least of which is LOGICAL
arguments and supporting reasons and substantiation on ANY
subject.
You are lacking in those areas, and your knowledge of mathematics
or the history of mathematics seem extremely shallow, and you
have proven that you are using nearly all the known Informal
Fallacies in LOGIC in your arguments, including your present
post, without addressing the SUBSTANCE of your own errors.
If you characterize my reasoned arguments (with plenty of
supporting evidence) as "bee in my bonnet", then I must
characterize yours as "burr in your arse".
BTW, I do apologize for my careless reference to you several
times as Tortel. A mistake that turned out to be in your favor.
Torkel Franzen is/was the "Tortel" in ALL cases of my posts, and
is the unworthy mathematician who is deficient in attribution
or reference on matters of mathematics and one who makes
unwarranted and GROSSLY erroneous assumptions about MY interests
and expertise.
There!
-- Bob.
> I am a man of many interest, not the least of which is LOGICAL
> arguments and supporting reasons and substantiation on ANY
> subject.
You conceal this interest with great skill.
Not nearly as much skill as you demonstrated your bankrupcy in
mathematical knowledge on specific topics (Russell and Godel
in particular).
-- Bob.
> Not nearly as much skill as you demonstrated your bankrupcy in
> mathematical knowledge on specific topics (Russell and Godel
> in particular).
You're too modest!
Torkel Franzan's posts are mostly ONE-LINERS such as this one,
with vacuous substance in the subject matter.
I finally discovered BOTH the reason for Torkel's shallowness
in his posts as well as his fondness fo one-liners of vacuous
substance:
This is Tortel Franzen's DATA in the HISTORY of his USENET postings:
23,400 threads (each thread may contain) dozens of posts by Torkel)
in uncountable (not the mathematical def of uncountable)
number of non-mathematical and noisy groups, such as
Alt.relligion, alt.atheism, alt.atheism.moderated ,
talk.atheism, as well as NUMEROUS swnet.* groups such
as swnet.politik and other groups where one-lines are
about the capacity of the readers' attention span and
the poster's intellectual limit.
1 (ONE) thread in sci.stat.math, this one, sinice Torkel started
his USENET presence in 1987 (1 post then).
Go back to your Swedish ngs and argue in your line-lner style
with birds of the same feather as your kind,
Super Soul> Nej, eftersom det inte rör sig om ett "konstaterande"
som är fritt från subjektivt tyckande.
Torkel> Man kan givetvis ha subjektiva övertygelser om helt
objektiva förhållanden!
Super Soul> Och än värre är det när någon hävdar att deras
uppenbart subjektiva åsikter är rent objektiva fakta!
That was the TOTAL content of three posts, and the one-line posts
went on another dozen or more posts in the same argument.
Torkel Franzen, GET A LIFE!
-- Bob.
> Torkel Franzan's posts are mostly ONE-LINERS such as this one,
> with vacuous substance in the subject matter.
Surely you mean "no substance".
> G Robin Edwards wrote:
> >
> > Totally lacking access to anything to with SPSS, or even SAS, I
> > would be very pleased to be told how I might get hold of that data
> > set. Sounds like an interesting and instructional exercise to try
> > analysing it from a standing start.
> >
> > Robin
> >
> > <Further snip>
> I know there are quite a few SPSS users in these newsgroups. Some of
> them may even be employees of SPSS, Inc. If any of them, via library
> or their own copies of SPSS Manuals can scan/transfer the DATA on
> page 359 of the SPSS Manual of 1975, and post it, I think it should
> be of interest to many readers, besides yourself.
> It's the ANNUAL data from 1935 to 1966 on Investors Index (1949 =
> 100), the dependent variable; and GNP, Corporate Profits before
> taxes, and Corporate Dividends paid as the three independent
> variables in the multiple regression.
> If no one has access to them, I'll manually type and post the data in
> a future post.
Many thanks! I look forward to further postings on this.
Robin
Robin,
I don't see many eager volunteers to help on providing the data.
Let me try to see if that data is STILL used in the current SPSS
Manual. It was used many years merely as an illustration of the
output of a multiple regression program in SPSS.
Here's my question to CURRENT SPSS users:
Is the Invester's Index vs GNP, Corporate Profit, and Corporate
Dividend example still in the SPSS Manual Multiple Regression
section?
If so, perhaps someone ready made scan and web facilities can
help provide that data set.
As mentioned elsewhere, the last time I used SPSS was about 1971.
But that example in the Manual was around for a long time thereafter.
-- Bob.