Wood's coin-tossing article

Tom Bramley

unread,

Jun 10, 2011, 4:05:50 AM6/10/11

to talking-m...@googlegroups.com

Paul, Jack

Your recent posts reminded me of something I'd meant to ask the group - how much importance should be attached to Bob Wood's 1978 article showing that random coin tossing data fit the Rasch model?

Wood, R. (1978). Fitting the Rasch model: a heady tale. British Journal of Mathematical and Statistical Psychology, 31, 27-32.

I'd always taken it as I assumed Wood intended - as a bit of light relief. He implies as much in the acknowledgement at the end of the article. I tried such a simulation myself when I first read his article many years ago, and replicated his findings - but of course the Rasch software I used showed that person and item separation reliability was zero, confirming that the 'data' was just noise.

I took this to show that good fit is necessary but not sufficient for a meaningful Rasch (or presumably any) IRT scale. The reliability should be considered too, as is done in the RUMM software which gives an indication of the power of fit tests ('excellent', 'good' etc), based on the value of the reliability coefficient.

Also, I take it to show the importance of having a theory in advance about expected item (and person) ordering instead of applying the model in 'cookbook' fashion. But I don't see his article as showing anything deeper than that. Am I missing something?

Regards,

Tom.

If you are not the intended recipient, employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination or copying of this communication and its attachments is strictly prohibited. If you have received this communication and its attachments in error, please return the original message and attachments to the sender using the reply facility on e-mail. Internet communications are not secure and therefore Cambridge Assessment (the brand name for the University of Cambridge Local Examinations Syndicate, the constituent elements of which are CIE, ESOL and OCR [Oxford Cambridge and RSA Examinations is a Company Limited by Guarantee Registered in England. Registered office: 1 Hills Road, Cambridge CB1 2EU. Company number: 3484466]) does not accept legal responsibility for the contents of this message. Any views or opinions presented are solely those of the author and do not necessarily represent those of Cambridge Assessment unless otherwise specifically stated. The information contained in this email may be subject to public disclosure under the Freedom of Information Act 2000. Unless the information is legally exempt from disclosure, the confidentiality of this email and your reply cannot be guaranteed.

This message has been scanned for viruses by BlackSpider MailControl

Paul Barrett

unread,

Jun 10, 2011, 8:23:04 AM6/10/11

to talking-m...@googlegroups.com

Hello Tom

In my opinion, you have it just right ...

“I took this to show that good fit is necessary but not sufficient for a meaningful Rasch (or presumably any) IRT scale”

That simple demonstration showed that the Rasch (and other IRT models) are blind to meaning. They are simply stochastic statistical methods deigned to fit certain kinds of probabilistic response functions to data.

Clearly, Woods’ model-fit (and model) was a nonsense in every other respect – but it was a gentle warning heeded by the few who could see beyond the near-incomprehensible seemingly magical “number crunching” of these models at the time.

Paul Kine and I set out to explore this simple-minded proposition: “it will scale anything thrown at it”, with real data and the Rasch model, back in 1982 ... we attempted to fit the Rasch model to all 90 items in the Eysenck Personality Questionnaire. This was around the time everyone was promoting Rasch IRT as the magical tool to fix measurement in psychological science.

Paul and I felt it was just another statistical method that was blind to meaning. Our test proved our point nicely – rejecting about half the items in the process I think, but fitting a mixture of Psychoticism, Extraversion, Neuroticism, and Social Desirability items pretty well.

Barrett, P. T., & Kline, P. (1981) A comparison between Rasch analysis and factor analysis of items in the EPQ. Personality Study and Group Behaviour, 1, 2, 11-28.

It was only a ‘quickie-study” an aside to my PhD at the time – rejected by serious journals and sober psychometricians alike at the time (e.g. “we knew this” and “only a fool would attempt to do such a thing”!) ... story of our lives really - we just asked the simple questions! So, Paul got it into an obscure Indian journal in the end. I’ll scan it and get it up on my website.

But, my “bent-ruler” demo in 1999 finished any belief I had in the Rasch to do anything “special” in and of itself ... it fit my ordinal numbers perfectly, with no chance of it “recovering” the underlying quantitative lengths .. and the funniest (and saddest for me) were some Rasch advocates telling me I didn’t have enough cases or enough random noise in the dataset. The presentation and results are downloadable at:

http://www.pbarrett.net/presentations.html#Beyond_Psychometrics

It’s a bit hairy in places, and reflects my own confusions at the time about certain matters in places, but again it shows that deploying IRT without a strong theory of what it is you propose to measure, and why use Rasch vs something else, might be good for practical purposes, but not much else.

It’s ironic, 12 years later, I could now produce the mother of all simulations to more properly empirically isolate and examine the cost of that Achilles heel of Rasch IRT (the issue of the noise it needs to fit response data) – but now I have to earn my blasted living consulting! But, anyway, Joel Michell probably took care of all this from a logical perspective in Michell, J. (2004) Item Response Models, pathological science, and the shape of error. Theory and Psychology, 14, 1, 121-129.

Regards .. Paul

W: www.pbarrett.net

E: pa...@pbarrett.net

M: +64-(0)21-415625

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.
To post to this group, send email to talking-m...@googlegroups.com.
To unsubscribe from this group, send email to talking-measure...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/talking-measurement?hl=en.

image001.gif

Stephen Humphry

unread,

Jun 10, 2011, 9:59:53 AM6/10/11

to talking-m...@googlegroups.com

Paul, you say:

But, anyway, Joel Michell probably took care of all this from a logical perspective in Michell, J. (2004) Item Response Models, pathological science, and the shape of error. Theory and Psychology, 14, 1, 121-129.

Steve:

Not really. It went something like this. Rasch developed models, starting with multiplicative Poisson and then to dichotomous. Later, people including David Andrich, Bock, Lord & Novick looking in a post-hoc way at the Item response models through that lens. Then Joel argues against depending on the shape of error.

That is not to say Item response models are immune from the criticism, but there is quite an irony in it all for me, as you may appreciate.

Interesting discussion.

Cheers, Steve

From: talking-m...@googlegroups.com [talking-m...@googlegroups.com] On Behalf Of Paul Barrett [pa...@pbarrett.net]
Sent: Friday, 10 June 2011 8:23 PM
To: talking-m...@googlegroups.com
Subject: RE: [talking-measurement] Wood's coin-tossing article

Andrew Kyngdon

unread,

Jun 11, 2011, 10:09:53 PM6/11/11

to talking-m...@googlegroups.com

Paul,

You say: That simple demonstration showed that the Rasch (and other IRT models) are blind to meaning. They are simply stochastic statistical methods deigned to fit certain kinds of probabilistic response functions to data.

The field of decision making under risk and uncertainty is one area of quantitative social science in which emphasis is placed on descriptive, non-stochastic theories. I believe that all psychometricians would benefit from understanding this field.

Stochastic theories are not needed to describe risky choice behaviour, although such theories have been created and undoubtedly have been useful. Asset pricing (Barberis, Huang & Santos, 2001), the behaviour of options traders (Fox, Rogers & Tversky, 1996), insurance policy choice (Johnson, Hershy, Meszaros & Kunreuther, 1993) and even the relatively low incidence of tax evasion (Bernasconi, 1998) can all be explained by the non-stochastic “prospect theory” of Kahneman & Tversky (1979).

To further put it into perspective, Kahneman & Tversky (1979) is the second most cited paper of all time for the journal “Econometrica”. What’s the most cited paper for “Psychometrika”? The last time I looked it was Cronbach’s (1951) paper on alpha. I believe that psychometrics may contribute something more important than Cronbach’s alpha if attention was paid to the description of cognitive abilities, rather than the calibration of tests and the fitting of stochastic models to item response data.

In fairness to Rasch, he developed his multiplicative Poisson model as an attempt to describe reading errors made by Danish schoolchildren. Unfortunately, Rasch’s modern devotees have taken the models developed subsequently to the Poisson as the means by which the problem of psychological measurement will be solved. If only things were so simple.

The Lexile Framework presents perhaps the first sustained attempt to break away from conventional psychometric thinking in regards to cognitive abilities, but what has been the consequence of that? Jack Stenner being banned from publishing in the International Reading Association journals for yet another decade.