D: Here's how I think classical test theory, IRT, and ACM are related.
Whenever you see science fiction, that means an important bit of
experimentation is impossible.
The only real axiom of classical test theory since '66 is that the
test score is a random variable. The rest is definitions, statistics,
extensions, and shadowboxing. But it's an important if not crucial
axiom. So let's take a dichotomous variable, as defined on, say, the
event "Guenter beats Kasparov in a game of chess", scored Guenter
wins:1 and Guenter loses:0. The true score is the expectation of this
variable; in this case Guenter's true score equals P[Guenter wins]. We
cannot determine P[Guenter wins] empirically now, but suppose a few
years from now we can wash Guenter's brains, put him in a time
machine, and let him play the same game again, doing this over and
over into infinity to obtain the long run relative frequency of
Guenter winning: this is P[Guenter wins].
Let's now posit the hypothesis, that if you knew P[Guenter wins] with
the opponent fixed, you knew something important about Guenter's chess
playing ability; specifically, you knew that whenever chess playing
ability increases, P[Guenter wins] increases as a result [test
validity]. Now we want to see if we can get a stronger representation
than this monotonic one, so we bring in ACM. We represent P[Guenter
wins] as a function of the difficulty of his opponent and his ability;
we know that P[Guenter wins] by itself defines an order, so that's
ok. Following the line of thought underlying ACM, we then manipulate
Guenter's ability and Kasparov's difficulty and examine the effect on
P[Guenter wins]. We do this, say, by shrinking and expanding Guenter's
and Kasparov's brains. We know that if the effect on P[Guenter wins]
comes from an additive trade-off between his ability and Kasparov's
difficulty, then there exists a set of functions on P[Guenter wins],
Guenter's ability, and Kasparov's difficulty, that represents this
trade-off up to linear transformations of these functions. This
defines an interval scale, or so important people have claimed. These
people have also promised us that interval scales are really great
things that will make us rich, so we want one. A sufficient condition
for obtaining an interval scale is double cancellation so of course we
check this. We can test it directly on the probabilities from the time
machine [no statistics needed, all deterministic tests despite that
they are done on probabilities!] and it turns out to hold.
So: It all works great, and we're having fun shrinking and expanding
Guenter's brain, obtaining rock solid interval measures of Guenter's
chess playing ability. Unfortunately, just before we're done, the
ethical committee decides that we can't shrink brains just like that.
We aren't allowed to manipulate Guenter's and Kasparov's chess playing
abilities anymore. We're stuck.
Then we get another bright idea, one that cannot be associated with
any particular school but appears to have crept into psychometrics all
by itself. We notice that, if we supposed that 1) if Guenter's chess
playing ability A[G] were in fact lower, say, the same as dumber
Denny's chess playing ability A[D], and 2) all the rest of the model
remained the same, then 3) we could take the probability that Guenter
would win if he had Denny's ability, P[Guenter wins|A[G]=A[D]], to be
the same as the probability that Denny wins given his level of
ability, P[Denny wins|A[D]]. So we impose the assumption that
P[Guenter wins|A[G]=A[D]]=P[Denny wins|A[D]]. Since we still have the
brainwashing time machine, we can still get at P[Denny wins] even
though we can't shrink Guenter's brain anymore. Because we can't
change Kasparov's brain either, we also get Derek in, and Andrew and
all of the other people. We let all of them play against each other,
test the axioms, and represent their abilities on a common interval
scale [of course now under the untested assumption that they're
exchangeable].
Unfortunately our time machine breaks down so we have no way to get at
the probabilities anymore! This is a bummer, but fortunately we still
have statisticians and IRT. Rasch has shown, in this particular
probabilistic case, that if we want to estimate abilities and
difficulties from a single test administration on lots of subjects,
without confounding, say, Kasparov's difficulty and Guenter's ability,
it has to be the case that the functional relation between Guenter's
ability and P[Guenter wins] is logistic and of the same form, while of
different location, for all of his opponents; to the surprise of many,
this is the only function that buys us sufficiency of the total score
and therefore enables conditioning on the total score so as to lose
the unobserved abilities from the statistical equations. If we then
assume, in addition, that a sample of people have all played a sample
of other people [items] we can use the statistics to estimate these by
conditional maximum likelihood [of course, nobody with statistical
taste would use marginal maximum likelihood here!]. Now we can still
go through all the steps above and for instance test for double
cancellation, but now the tests aren't deterministic anymore and
testing double cancellation is equivalent to testing parametric
hypotheses in a probability model.
In reality, we are at the end of the line, after all the science
fiction, trying to muddle on with the limited opportunities we have.
Best
Denny
--
Denny Borsboom
Department of Psychology
University of Amsterdam
Roetersstraat 15
1018 WB Amsterdam
The Netherlands
phone: +31 20 525 6882
email: d.bor...@uva.nl
homepage: http://users.fmg.uva.nl/dborsboom
Surprise, surprise, I even understand now why you conflate statistical modelling (intentionally I’m NOT saying RM) and ACM! (Obviously a side effect of the brain gymnastics!)
On the later more, but I must first recover from the dizziness after all that “Gunter’s brain shrinking – expanding - time travelling” trip. :-)
Regards,
Dr. Hfuhruhurr …eh, I mean G (1)
(1) Explanation: while reading your post I had to think permanently about myself as that Dr. Hfuhruhurr form “The Man with Two Brains” (1983): http://en.wikipedia.org/wiki/The_Man_with_Two_Brains
By the way, a brilliant movie. :-)
-----Ursprüngliche Nachricht-----
Von: talking-m...@googlegroups.com im Auftrag von Denny Borsboom
Gesendet: Do 28.10.2010 21:47
An: talking-m...@googlegroups.com
Betreff: [talking-measurement] CTT, IRT, and ACM
Best
Denny
--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.
To post to this group, send email to talking-m...@googlegroups.com.
To unsubscribe from this group, send email to talking-measure...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/talking-measurement?hl=en.
When is a measurement not determined? What is a non-determined measurement?
ACM is non-stochastic. It is completely silent on what determines the behaviour of natural systems.
Cheers,
Andrew
-----Original Message-----
From: talking-m...@googlegroups.com [mailto:talking-m...@googlegroups.com] On Behalf Of Trendler, Guenter
Sent: Friday, 29 October 2010 7:11 PM
To: talking-m...@googlegroups.com
Subject: AW: [talking-measurement] CTT, IRT, and ACM
Denny, aside from CCT, I think we are on the same page with regard to the almost certain impossibility of deterministic measurement (ACM) in psychology. Correct?
Surprise, surprise, I even understand now why you conflate statistical modelling (intentionally I'm NOT saying RM) and ACM! (Obviously a side effect of the brain gymnastics!)
On the later more, but I must first recover from the dizziness after all that "Gunter's brain shrinking - expanding - time travelling" trip. :-)
Regards,
Dr. Hfuhruhurr .eh, I mean G (1)
________________________________________
From: talking-m...@googlegroups.com [talking-m...@googlegroups.com] On Behalf Of Andrew Kyngdon [AKyn...@lexile.com]
Sent: Friday, 29 October 2010 5:08 PM
To: talking-m...@googlegroups.com
Subject: RE: [talking-measurement] CTT, IRT, and ACM
I don't want to conflate ACM with a statistical model, I haven't
intentionally said anything in this direction I hope. I respect and am
aware of the general difference between ACM and statistical models.
The fact that ACM gets played out on probabilities is specific to the
IRT case, it's not a general point about ACM as Andrew correctly
remarks (although he misattributes to me the idea that it is).
What I said, and tried to show, is rather that in practice testing
hypotheses of ACM will imply testing corresponding hypotheses on
parameters of a statistical model if a) we cannot get rid of noise (as
is often the case) or b) we explicitly build the model on
probabilities (as in the IRT case).
Best
Denny
On 10/29/10, Trendler, Guenter <guenter....@zi-mannheim.de> wrote:
>
> Denny, aside from CCT, I think we are on the same page with regard to the
> almost certain impossibility of deterministic measurement (ACM) in
> psychology. Correct?
>
> Surprise, surprise, I even understand now why you conflate statistical
> modelling (intentionally I'm NOT saying RM) and ACM! (Obviously a side
> effect of the brain gymnastics!)
>
> On the later more, but I must first recover from the dizziness after all
> that "Gunter's brain shrinking - expanding - time travelling" trip. :-)
>
> Regards,
> Dr. Hfuhruhurr ...eh, I mean G (1)
G. Well let's enjoy the moment for I fear it will not last long. ;-)
D. What I said, and tried to show, is rather that in practice testing
hypotheses of ACM will imply testing corresponding hypotheses on
parameters of a statistical model if a) we cannot get rid of noise (as
is often the case) or b) we explicitly build the model on
probabilities (as in the IRT case).
G. Obviously physicists were very soon aware of “noise” in the data and developed different tools to deal with the problem. With regard to discovery of quantitative structure: random error in repeated measurements was accounted for through calculation of means. Systematic error was controlled through modification in the experimental apparatus or, if quantifiable, through calculation. Results from different experiments were compared by visual inspection. Sometimes the fit between empirical data and theory was verified also visually (e.g. Ohm, 1826, pp. 151-152). Hence, as Paul points out, physics “didn't need axiomatic theory or statistical models to develop its 'measures'” (at task which was largely accomplished up to the end of the 19th century, before the dissemination of statistics and abstract measurement theory) and that should tell us something! Of course, if statistical tests were availably they probably would have been used, as they are standard today (see Taylor, J. R., An Introduction to Error Analysis, Sausalito, CA, 1997).
“Visual inspection” plays an important role even today in physics. For instance Taylor writes in Chap. 6.1 “The problem of rejecting data”:
“Sometimes, one measurement in a series of measurements appears to disagree strikingly with all the others. When this happens, the experimenter must decide whether the anomalous measurement resulted from some mistake and should be rejected or was a bona fide measurement that should be used with all the others. For example, imagine we make six measurements of the period of a pendulum and get the results (all in seconds)
3.8, 3.5, 3.9. 3.9, 3.4, 1.8. (6.1)
In this example, the value 1.8 is startlingly different from all the others, and we must decide what to do with it. [/] We know from Chapter 5 that a legitimate measurement may deviate significantly from other measurements of the same quantity. Nevertheless, a legitimate discrepancy as large as that of the last measurement in (6.1) is very improbable, so we are inclined to suspect that the time 1.8 s resulted from some undetected mistake or other external cause.” (p. 165)
Now, let's take again Ohm’s (1826) data:
300,75 277,75 238,25 190,75 134,5 83,25 48,5
287 267 230,25 183,5 129,75 80 46
1,05 1,04 1,03 1,04 1,04 1,04 1,05
The series of measurement (last row) doesn’t appear to disagree strikingly.
In comparison the Andrich data:
112 196 116 68 10
12 28 24 16 7
9,33 7 4,83 4,25 1,43
We expected that the ratios should be invariant across the continuum (last row), but the values are startlingly different of one another, and we must decide what to do. In fact, given the quantitative hypothesis investigated, “our only really honest course is to repeat the measurement many, many times. If the anomaly shows up again, we will presumably be able to trace its cause, either as a mistake or a real physical effect. If it does not recur, then by the time we have made, say, 100 measurements, there will be no significant difference in our final answer whether we include the anomaly not.” (Taylor, p. 165) Of course, Ohm’s law also had to be confirmed in many repeated measurements, many times. Ohm’s merit is to be the first to have presented data which confirmed invariance across the continuum.
In conclusion, show me the constant ratios and “I will acknowledge, with Paul, that the attribute is indeed quantitative for all practical purposes.”
Regards
Guenter
-----Ursprüngliche Nachricht-----
Von: talking-m...@googlegroups.com im Auftrag von Denny Borsboom
Gesendet: Fr 29.10.2010 12:40
An: talking-m...@googlegroups.com
Betreff: Re: [talking-measurement] CTT, IRT, and ACM
Sorry, but I'm still on the same page, agreeing with almost all you've
said. I even favor binocular significance over statistical
significance as well. My point was however that, in the Andrich
example listed below, you're eventually testing measurement axioms
through statistical hypotheses, just like the rest of the psychometric
world. Whether you do that with your eyes or with a formal test makes
little difference to me. So you're illustrating my point.
I am somewhat surprised that it would just take a model fit test
[binocular or statistical] to persuade you of the presence of
quantitative measurement. I am also surprised that you say "show me
the constant ratios and I will acknowledge, with Paul, that the
attribute is indeed quantitative for all practical purposes.” If you
really meant that, you'd be gullibe; surely some of the trillion
psychological tests in use today will show the required constancy.
Maybe someone on this forum even has data that do.
Best
Denny
D. Sorry, but I'm still on the same page, agreeing with almost all you've
said. I even favor binocular significance over statistical
significance as well. My point was however that, in the Andrich
example listed below, you're eventually testing measurement axioms
through statistical hypotheses, just like the rest of the psychometric
world. Whether you do that with your eyes or with a formal test makes
little difference to me. So you're illustrating my point.
G. Insofar as there is always random noise in the data,
testing anything always involves testing statistical hypotheses.
D. I am somewhat surprised that it would just take a model fit test
[binocular or statistical] to persuade you of the presence of
quantitative measurement. I am also surprised that you say "show me
the constant ratios and I will acknowledge, with Paul, that the
attribute is indeed quantitative for all practical purposes." If you
really meant that, you'd be gullibe; surely some of the trillion
psychological tests in use today will show the required constancy.
Maybe someone on this forum even has data that do.
G. I'm here waiting.
Guenter
"If you really meant that, you'd be gullible; surely some of the trillion
psychological tests in use today will show the required constancy. Maybe
someone on this forum even has data that do."
Denny, bottom line ..
No "psychometric" test on the market today possesses a standard unit against
which magnitudes may be expressed as a ratio.
If it did, the world would know about it, and this list would become
irrelevant overnight. We would be talking about the measurement of a
psychological attribute in the same manner we speak of a measurement of
length or time - with ALL that implies. E.g. an IQ-unit of 1.5 vs 2.5 in a
measurement range of 256 units would translate into observable consequences
in the real world; it would mean something.
The nearest we came to this was Lehrl and Fisher's BIP unit.
I'm sure the Metametrics units come close ...
Mathsgarden? Who knows?
It is as the late great Peter Schnonemann stated: look at that first
sentence and weep - I'd tattoo that sentence into the forehead of every
"methodologist/psychometrician" claiming to "study the measurement of
psychological attributes"
Schonemann, P. (1994) Measurement: the Reasonable Ineffectiveness of
Mathematics in the Social Sciences. In I. Borg and P.Mohler (Eds.). Trends
and Perspectives in Empirical Social Research. Walter de Gruyter. (p. 158)
"Whatever use axioms may have in mathematics, in an empirical science they
must be either self-evident or empirically founded. However, it is far from
self-evident why the Archimedean axiom should hold in psychology, or in
biology, where most phenomena are bounded by physiological constraints. Nor
is it self-evident why it should always be possible, or even helpful, to
remove interactions as additive conjoint measurement (and the closely
related "functional measurement") try to do. Why should the "crisp"
mathematics of physics apply without change to the fuzzy nature of living
things? Why should subjects always utilize a particular family of distance
functions when they produce dissimilarity ratings, and what prompts them to
always interpolate a monotone transformation so that we always can use the
same canned programs?
None of this is self-evident a priori, nor is any of it empirically founded.
In some instances, as we saw, there is solid empirical evidence to the
contrary, which is simply brushed aside. As Coombs (1983) observed, the line
separating this research strategy from "mathematical game playing in search
of a trivial application ... is an exceedingly difficult line to draw" (p.
93).
What should have been self-evident from the start is that a research
strategy which develops models "independently as a body of abstract formal
theory with empirical interpretations being left to a later stage" was
doomed from the outset. Thus, in the social sciences, the real mystery is
how anyone could have seriously believed the empirical connections would
materialize at a later stage. As the experience of the last 20 years shows,
they didn't. "
----------------------------------------------------------------
In reply to your "Attack of the Psychometricians" paper in Psychometrika
(more like 'Attack of the Zombies' I'm afraid!) .. Borsboom, D. (2006) The
attack of the psychometricians. Psychometrika, 71, 3, 425-440, there was a
very understated but very meaningful commentary paper which highlighted the
futile navel-gazing intensity of "modern psychometrics" ... Heiser, W.J.
(2006) Measurement without copper instruments and experiment without
complete control. Psychometrika, 71, 3, 457-461.
p. 457 " One basic reason that measurement in psychology requires statistics
is that psychologists do not use copper instruments anymore, as they used to
do in the nineteenth century. Instead, they determine test or total scores
on the basis of miniature experiments with discrete outcomes, and use a
variety of standard statististical techniques for reaching conclusions on
the basis of observed data. Borsboom (2006) wants us to believe that
psychologists are seriously misled in their hope that they can make progress
this way, and recommends an invasion of psychometricians carrying IRT
missiles and SEM guns into psychology. My prediction is that such an
invasion would simply be ignored. That is not to say that whenever
psychometric modeling really makes a difference, no attempts should be made
to reach the mainstream of psychology. Indeed, many psychometric
contributions that are obsolete according to Borsboom, like Cronbach's alpha
and exploratory factor analysis, in fact entered into the mainstream of
psychology only because they tend to provide sensible answers to real
problems, which cannot be easily surpassed. We should be more proud of them
(even when a bit vulgarized), and carefully foster our accumulated knowledge
base. Apart from strictly psychometric contributions, it has always been a
task of psychometricians to introduce relevant new developments in the broad
domain of mathematics and statistics into psychology. I am convinced that we
should continue to do so, even when it concerns "observed score techniques"
that are so detested in the focus article."
-----------------------------------------------------------------
I think what's required now is a return to fundamental science,
process-dynamics models, evolved-over-time simulations, some natural
curiosity/simplicity of thought/experimentation, and lots of simple
observations, with replication. The very "stuff" of early physics. It's
difficult, requires huge innovation, and attention to even a single outlier
observation. Aggregate data model statistics are not required except as
crude "penomenon detection" devices.
Which might mean "Talking Measurement" gradually evolves into "Talking
Measurement within an Empirical Science of Psychology".
And, as one last "kick up the pants for those who would call themselves
psychometricians", I'd tell them to read:
Keeping it simple, Christopher Peterson and Nansook Park on the lasting
impact of minimally sufficient research' The Psychologist, Vol. 23, #5,
Pages: 398-401
Abstract
A special issue of Perspectives on Psychological Science, published by the
American Psychological Society, invited opinions from a variety of
psychologists, including us (Diener, 2009). Our advice was to keep it simple
(Peterson, 2009). We offered this advice not because simplicity is a virtue,
although it is (Comte-Sponville, 2001). Rather, the evidence of history is
clear that the research studies with the greatest impact in psychology are
breathtakingly simple in terms of the questions posed, the methods and
designs used, the statistics brought to bear on the data, and the take-home
messages.
------------------------------------------------------------------
And yes, before the question is asked, I and a bunch of "go for it" students
would be doing work like this if I could ever have worked full-time at a
University, post-Michell. But, my desire to produce really innovative
science and assessment (tearing up many rule-books) does not endear me to
any academic department (or commercial test company). So, I have to earn a
living doing ad-hoc consultancy, and innovating where I can in any
down-times.
It's probably the reason why I worked so well with Hans Eysenck and Pauk
Kline - both "rebels" in their own way! Those "let's do it" days and people
are gone, sadly.
Regards .. Paul
---------------------------
Advanced Projects R&D Ltd.
---------------------------
W: www.pbarrett.net
E: pa...@pbarrett.net
M: +64-(0)21-415625
I didn't say anyone has units on par with physics. I suspect we're not
likely to get them in the way we find them in physics, personally,
because the causality of our attributes isn't fit for it. Although I
would of course be very happy to be refuted, and am greatly interested
in work that challenges this idea, I think there will always be a
difference between basic quantities, like length and weight, and
anything we could produce in psychology [it would be interesting to
figure out precisely why; despite the brilliant title, Schonemann, in
my view, doesn't get either the problem or the solution right in the
piece you mention].
However, neither do I think that much of the work in current
psychometrics requires this type of units, or in fact assumes the
quantitative structure of the attributes themselves. Frankly I think
that, if it did, it would be explicit in the theorems or derivable
from them as a corrolary. That isn't the case, or at least nobody has
presented the case to me. The Rasch model comes closest, but it's
still a far cry, and what there is in terms of double cancellation
appears to come from the functional form of the IRFs, not from any
hypothesis on the internal structure of the latent variable [except
that it apparently has the capacity to enter into such relations; what
does that mean? No idea].
Nevertheless there is more than order. So the work that does in fact
appear to get at something quantitative I find fascinating, precisely
because I don't understand it [as I tried to explain, perhaps still
without success].
About the Heiser quote: it won't a surprise that I don't entirely
agree, but on the other hand: You'd almost say that a journal that
publishes such stuff couldn't really belong to a pathological society,
right?
Best
Denny
Does the Rasch Model Convert an Ordinal Scale into an Interval Scale?
T. Salzberger … Rasch Measurement Transactions, 2010, 24:2 p. 1273-5
http://www.rasch.org/rmt/rmt242a.htm
I don't think the argument therein makes much sense, but it does seem
to move in a similar direction as Denny's argument.
Derek
---------------------------------------------------------------------------------------
Derek Briggs
Associate Professor & Program Chair
Research & Evaluation Methodology
School of Education
University of Colorado, Boulder
Boulder, CO 80309
http://www.colorado.edu/education/faculty/derekbriggs/index.html
True, but, consistent with this logic, Darwin didn't know about DNA when he developed his theory of natural selection. So that "should tell us something" about the need to know about DNA.
Galileo didn't know anything about atomic physics when he used water clocks to measure time. So that "should tell us something" about the BIPM using atomic theory to define the second. What are those silly physicists thinking? Atomic clocks don't tell us any more about time than water ones do.
People sometimes recovered from horrific conditions such as blood poisoning before the Australian scientist Howard Florey used Alexander Fleming's discovery of penicillin to invent the first antibiotics. So that "should tell us something" about the need for antibiotics. How many lives have been saved using Florey's invention or variants thereof? About 50 million at least. But yeah, why do we need to know about antibiotics?
Incidentally, Florey shared with Fleming and Ernest Chain, the researcher whom brought Fleming's paper to Florey's attention, the 1945 Nobel Prize for Medicine. That "should tell us something" about Nobel Prize selection panels.
I'm being sarcastic, but you can see the point. Ignoring developments is relevant fields hardly constitutes scientific progress.
Andrew
Andrew, maybe I’m misunderstanding your objections... The question is why neither AXIOMATIC measurement theory nor statistical models were needed for the discovery of quantitative structure. (The inversion of the argument does of course not apply. That is, it is not implied that abstract measurement theory and statistics should somehow in general be discarded as superfluous.) The “that should tell us something” was meant as an incentive to reflect on why physics managed so well without them. What is the difference to psychology? The development of measurement theories like RM and ACM is celebrated as revolutionary, as if what psychology was missing all the time was sophistication in the mathematics of measurement. However, the problem with psychology may have never been on the side of theoretical subtlety, but of reality which simply refuses to be captured by MEASUREMENT models.
Guenter
-----Ursprüngliche Nachricht-----
Von: talking-m...@googlegroups.com im Auftrag von Andrew Kyngdon
Gesendet: Mo 01.11.2010 01:23
An: talking-m...@googlegroups.com
Betreff: RE: [talking-measurement] CTT, IRT, and ACM