The following excerpt is from Everitt & Dunn 1992.
Multivariate Statistical Analysis. Chapman & Hall, London
(section 14.8., pp.272-274)
It illustrates some statistical fantasies that
sometimes arise when data goes in search of theory.
Remember those childhood games when you stared at clouds (or
star constalletions) and saw all kinds of beasties? The
beasties were merely figments of your imagination -- though
sometimes they looked so real! Anyhow, the situation at hand
could be partly, at least, blamed on seductive nomenclature
of statistical techniques, e.g. "confirmatory" factor
analysis, bootstrap, jack knife, etc.
The bottom line is that all this huffing and puffing about "structural models"
in journals and conference proceedings may not be worth too much.
enjoy,
james ssemakula
uc riverside
usual disclaimers (but I volunteer accept a large check from the publishers
and/or authors or preferably a thick wad of dollar bills...)
====================begin excerpt====================
Causal Models and Latent Variables - Myths and Realities
During the last decade 'causal analysis' has spread like
wildfire across the fields of quantitative social science.
Journals of economics, sociology, political science and
psychology are aflame with path diagrams, structural
equation systems, confirmatory factor analysis and something
called LISREL.
Arthur S. Goldberg (1983)
Like many methodological advances before it, the
fusion of factor analysis, multiple regression and
simultaneous equation models achieved in the 1970s by
Joreskog, Bentler and Browne has been received by many
social and behavioural scientists with an enthusiasm which
has often overcome their usual critical faculties. In
disciplines such as psychology and sociology still seeking
the first steps along the road to some type of unifying
theory, the suggestion that causal inferences might be
demonstrable from correlational data combined with the
arrival of a well-documented and forcefully marketed piece
of software which makes the application of the procedure
relatively simple, has clearly been hard to resist. That
old, but still apposite aphorism, 'correlation does not
imply causation' often appears to have been conveniently
forgotten amongst the mass of path diagrams, parameter
estimates and models for everything from theories of
intelligence to sexual behaviour. Latent variables are
given names and tested as if they had an independent
existence and might even be manipulated if necessary. In
many cases little attention is given to the purpose that the
causal model final settled upon is intended to serve.
All of this is, of course, an extremely
unsatisfactory state of affairs, and it is little wonder
then that thinly veiled criticisms such as that implicit in
the quotation given at the beginning of this section are now
beginning to appear. Dealing first with the issue of
causality, it is important to recognise that seldom (perhaps
never) do structural equation models provide any direct test
of their causal assumptions; they are best seen as
convenient mathematical fictions which describe the
investigator's belief about the causal structure of a set of
variables of interest. But however convincing, respectable
and reasonable a path diagram and its associated model may
appear any causal inferences extracted are rarely more than
a form of statistical fantasy. The only satisfactory way to
demonstrate causality would be through the active control of
variables. As pointed out by Cliff (1983), with
correlational data it is simply not possible to isolate the
empirical system sufficiently so that the nature of the
relationships among variables can be unambiguously
ascertained. Of course, many investigators proposing causal
models might argue that they are using the term causal in a
purely metaphorical fashion. As pointed out by de Leeuw
(1985) such a cavalier attitude towards terminology becomes
hard to defend if, for example, educational programs are
based on your metaphors, such as the metaphor that
'intelligence is largely genetically determined' or
'allocation of resources to schools has only very
minor impact on the careers of students.'
Essentially so-called causal models simply
provide a parsimonious description of a set of correlations.
This is made explicit inn the work of Kuveri and Speed
(1982) who demonstrate that such models are equivalent to
conditional independence statements. Consequently the use
of a package such as LISREL is equivalent to a search
amongst conditional independence models for a model with
good fit, high explanatory power and which is parsimonious.
Such a search is partly guided by objective goodness-of-fit
procedure described in the previous section, but also partly
by prior knowledge, interpretability etc. such a combination
of objective and subjective procedures makes it difficult to
believe that two independent researchers will come up with
the same model in any but the simplest of situations. Many
of these problems stem of course from the relative lack of
well-specified causal theories in the social sciences.
So if the causal in causal modelling is usually
a misnomer is the concept of the latent variable more
satisfactory? Well in one sense latent variables can never
be anything more than is contained in the observed variables
and never anything beyond what has been specified in the
model. For example, in the statement that verbal ability
is whatever certain tests have in common, the empirical
meaning is nothing more than a shorthand for the observation
of the correlations. It does not mean that verbal ability
is a variable that is measurable in any manifest sense. In
fact latent variables are essentially hypothetical
constructs invented by a scientist for the purpose of
understanding some research area of interest, and for which
there exists no operational method for direct measurement.
Consequently a question that needs to be asked
is can science advance by inferences based upon hypothetical
constructs that cannot be measured or empirically tested?
According to Lenk (1986) the answer is a resounding -
sometimes. For example, atoms in the 18th and 19th
centuries were hypothetical constructs which allowed the
foundation of thermodynamics; gravity is a further example
from physics. Clearly a science can advance using the
concept of a latent variable, but their importance is not
their 'reality' or otherwise but rather to what extent the
models of which they are a part are able to describe and
predict phenomena (Lakatos, 1977). This point is nicely
summarised by Fergusson and Horwood (1986).
Scientific theories describe the properties of
observed variables in terms of abstractions which summarize
and make coherent the properties of observed variables.
Latent variables, are, in fact, one of this class of
abstract statements and the justification for the use of
these variables lies not in an appeal to their 'reality' or
otherwise but rather to the fact that these variables serve
to synthesize and summarize the properties of observed
variables.
This point was also made by the participants inn
the Conference on Systems under Indirect Observation who
concluded, after some debate (see Bookstein, 1982),that
latent variables are 'as real as their predictive
consequences are valid.' Such a comment implies that the
justification for postulating latent variables is their
theoretical utility rather than their reality.
Summary
The possibility of making causal inferences
about latent variables is one which has great appeal for the
social and behavioural scientist simply because many of the
concepts in which they are most interested are not directly
measurable. Many of the statistical and technical problems
in applying the appropriate models to empirical data
have largely been solved and sophisticated software such as
LISREL means that researchers can investigate and fit
extremely complex models routinely. Unfortunately in their
rush not to be left behind in the causal modelling stakes
many investigators appear to have abandoned completely their
proper scientific sceptism, and accepted models as
reasonable, simply because it has been possible to fit them
to data. This would not be so important if it were not the
case that much of the research involved is in areas where
action, perhaps far-reaching action, taken on the basis of
the findings of the research, can have enormous
implications, for example in resources for education,
legislation on racial inequality etc. Consequently both
producers of such research and audiences or consumers of it
need to be particularly concerned that the conclusions
reached are valid ones. With this in mind we would like to
end with the caveat issued by Cliff (1983):
...beautiful computer programs do not really
change anything fundamental. Correlational data are still
correlational, and no computer program can take account of
variables that are not in analysis. Causal relations can
only be established through patient, painstaking attention
to all the relevant variables, and should involve active
manipulation as a final confirmation.
====================end excerpt======================
refs cited above:
Goldberg,A.S. 1983. Book review. Contemporary Psych. 28:858-9
Cliff,N 1983 Some cautions concerning the application of causal modelling
methods. Multiv. Behav. Res. 18:115-126
De Leeuw, J 1985 Book review. Pshometrika 50:371-5
Kuveri, H & Speed,T.P 1982 Structural analysis of multivariate data: a review.
in Sociological Methodology (ed S Leinhardt) Jossey-Buss, San Francisco
Lenk, P.J. 1986 Book review.JASA 1123-4
Lakatos,I. 1977 The methodology of scientific research programs. cambridge U
press,cambridge
Fergusson,D.M. & Horwood,L.J. 1986 The use and limitations of structural
equations models of longitudinal data. Pers. Comm. (to Everitt & Dunn)
Bookstein,F.L. 1982 Panel discussion--modelling and methods. In Systems under
indirect observation causality structure and prediction (eds J. Joreskog
& H. Wold) North Holland, Amsterdam.
Assume each person has his own growth curve with personal parameters, time
points being independet observations along the curve.
Assume two or more observations on fixed time (or age) points taken on
several persons. If a covariance of time observations across persons are
calculated, you get a perfect basis for a structural equations between time
points. This is nonsense, each time point observation are independent
observations along the persons individual growth curve. If used for
prediction, both models may give equal results. If used for understandig,
the two models differ.
Erik Monness Hedmark College NORWAY
(2) How many lines of a copyrighted book is it ethical to copy and
send out on the Internet?
(3) Everitt and Dunn have some important things to say, as do many
critics of the application of latent variable models. As Phil Woods
points out, however, classical statistics should be considered a
major source for causal language and rhetoric.
It is quite true that many users of computer programs
for analysis of structural models are ignorant of the model
and its assumptions and implications: they are just looking for a
P-value (a big one, in this case!). Same is obviously true of many
users of anova. A question: is data analysis possible without
models? My answer is "no", but we must do a better job of educating
ourselves and our students about what mathematical modelling involves
AND enlarge the class of models that can be utilized.
For a decidedly aggressive view of the potential for deducing
causality from correlations see
DISCOVERING CAUSAL STRUCTURE: Artificial Intelligence,
Philosophy of Science, and Statistical Modeling, by Clark Glymour,
Richard Scheines, Peter Spirtes, and Kevin Kelly.
Orlando: Academic Press, 1987.
--
*************************************************
`o^o' * Neil W. Henry (nhe...@cabell.vcu.edu) *
-<:>- * Virginia Commonwealth University *
_/ \_ * Richmond VA 23284-2014 *
*(804)367-1301 (math sciences, 2079 Oliver) *
* 7-6650 (academic computing, B30 Cabell) *
*************************************************