iPhone App - 20/20 Vision

Twenty Twenty Vision

unread,

Mar 10, 2013, 9:55:41 AM3/10/13

to Talking Measurement

iPhone App - 20/20 Vision - http://2020visioniphoneapp.weebly.com

Caroline Long

unread,

Apr 15, 2013, 1:43:31 PM4/15/13

to Talking Measurement

Hi there,

I wonder what this group thinks about the following issues.

I come across a very loose use of the word "measure" in assessment literature. For example someone may write about a test as a "measuring instrument". My view is that an instrument has to earn the right to be called a measurement instrument and would prefer to retain that term to an instrument or a process which is approximating measurement to some degree.

What are your views on this?

I would also like to know where the term fundamental measurement was first used? Ok, I think I have my answer from Rasch.org. an article by Wright.

The statement that "fundamental measurement is not a physical operation, but a theoretical property" from , Duncan Luce and John Tukey (1964) liberates the concept from its physical origins. It is another term though that can get used loosely, thereby losing its essence. Any comments?

Caroline

Caroline Long (PhD)
Centre for Evaluation and Assessment (CEA)
Science, Mathematics and Technology Education (SMTE)
University of Pretoria
Phone (027) 012 420 5702 or 012 420 4175
Fax (027) 012 420 5723
email caroli...@up.ac.za

Denny Borsboom

unread,

Apr 15, 2013, 3:41:35 PM4/15/13

to talking-m...@googlegroups.com

Hi Caroline,

yes, the term measurement is used in a strict (quantitative) way and in a loose way (meaning something like "assessment" or just "picking up something (anything)"). I haven't studied the use of the term intensively but my guess is that these two uses are used across the sciences (although social scientists, as opposed to natural scientists, don't always have clear view of the fact that there are in fact two such uses and that their practices are not likely to satisfy the strong meaning). As usual, the semantic police isn't doing it's job very well ;-)

Fundamental measurement is extensively discussed in:

Campbell, N. R. (1920). Physics, the elements. Cambridge: Cambridge University Press.

I don't have access to the book now, and don't know from the top of my head whether Campbell invented the term; may be Betrand Russell in his Principles of Mathematics (1903) or another source, but you should be able to find it in one of these sources.

Another very influential discussion of the topic that you might find interesting is

Suppes, P. & Zinnes, J.L. (1963). Basic measurement theory. In: R.D Luce, R. Bush, & E. Galanter (Eds.). Handboook of mathematical psychology (pp. 3-76). New York: Wiley.

Myself I also value the contribution of Stevens, which is often seen as the source of the extremely colloquial use of the term 'measurement' in the social sciences:

Stevens, S. S. (1946). On the theory of scales of measurement. Science, 103, 667-680.

I think that paper deserves its status as a classic, but opinions on that issue are highly varied on this mailing list.

Best
Denny

> --
> You received this message because you are subscribed to the Google Groups "Talking Measurement" group.
> To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.
> To post to this group, send email to talking-m...@googlegroups.com.
> Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
> For more options, visit https://groups.google.com/groups/opt_out.
>
>

--
Denny Borsboom
Department of Psychology
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands
+31 20 525 6882
d.bor...@uva.nl
http://sites.google.com/site/borsboomdenny/dennyborsboom

John

unread,

Apr 15, 2013, 5:19:04 PM4/15/13

to talking-m...@googlegroups.com

Hi Caroline

Personally I prefer to reserve the term “measures” when we have done a (Rasch) calibration from which we derived ability estimates to distinguish them from (Classical) scores. You will immediately infer that scores are actually at the ordinal (or pseudo ordinal if we want to be more precise) “Stevens” level, whilst measures at least at the interval level.

Kindly

John

Prof John J Barnard (D.Ed.;Ph.D.;Ed.D.)

Executive Director: EPEC Pty Ltd

EPEC logo 3

www.epecat.com

--

image003.jpg

Caroline Long

unread,

Apr 15, 2013, 9:36:39 PM4/15/13

to talking-m...@googlegroups.com

Thanks John, and Denny for a comprehensive response.

The role of the semantic police is, I think, to make the conceptual distinction and then hope that the appropriate vocabulary follows.

Caroline

Caroline Long (PhD)
Centre for Evaluation and Assessment (CEA)
Science, Mathematics and Technology Education (SMTE)
University of Pretoria
Phone (027) 012 420 5702 or 012 420 4175
Fax (027) 012 420 5723
email caroli...@up.ac.za

>>> "John" <johnb...@bigpond.com> 2013/04/15 11:19 PM >>>

Dan

unread,

Apr 16, 2013, 7:04:25 PM4/16/13

to talking-m...@googlegroups.com

Caroline,

SS Stevens is largely responsible (**ahem** to blame) for the operational use of measurement as "applying numbers to things according to a rule". The problem is that everyone can measure everything with their own personal yard stick by this definition and we'd all be right as long as we explain the rule we used. Problem is that this is done poorly if at all. The reason we have so many psychological and other social science/educational tests is that no one can agree on basic rules or definitions of constructs. Want to create your own "measure" all you have to do change the definition of the construct, even just a little bit, and then create a measure for your new construct. After Stevens' definition took off there was an explosion of testing design in the social sciences, which has created a mess that no one really wants to look at or fix in any fundamental way. If you take away Stevens' rule then the house of cards falls. As Denny cited, there are some great thinkers who have made real contributions to measurement theory in the social sciences, Luce being one of my favorites as well as Suppes.

The bottom line is that measurement outside of the physical sciences can mean a lot of different things and must be critically examined before drawing any conclusions.

Thanks,

Dan

wal.sc...@sympatico.ca

unread,

Apr 16, 2013, 7:53:14 PM4/16/13

to talking-m...@googlegroups.com

I suppose that I am the heretic in this group of measurement specialists, but I have much less respect for the opinions of Luce, Suppes, and such. In my 1991 Psych Bull article I argued that fulfilling the requirements of representational measurement theory did not necessarily produce a procedure of any use in the development of science. (Schwager, K.W. (1991) “The Representational Theory of Measurement: An Assessment” , Psychological Bulletin 110: 618-626.) [One peer reviewer thought the draft was not worth publishing, the other thought it was the best article he had ever read.] IMHO any internal criteria for measurement instruments cannot establish the theoretical fruitfulness or practical worth of that measurement procedure - that is why I am much more in favour of validation approaches in their various forms.

Dan seems to suggest that the definition of a construct is primordial - my opinion is that only constructs are allowed that make theoretical or practical sense. And practical concepts may lead to improvised measurement procedures that do not fit in the SI framework. I have elaborated my views in my Erasmus U dissertation, Theories of Measurement in Social Science, and in articles such as Ontological and Epistemological Presuppositions of Social Theory, Bulletin de Méthodologie Sociologique September 1991 32: 54-80.

(At the moment, however, I am using my research skills to make money in the stock market.)

Date: Tue, 16 Apr 2013 16:04:25 -0700
From: dbuto...@gmail.com
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] terminology

Denny Borsboom

unread,

Apr 17, 2013, 1:27:10 AM4/17/13

to talking-m...@googlegroups.com

Hi Dan,

I think that if you read Stevens' 1946 paper closely, it's clear that he lays the groundwork for the axiomatic theory (not standard psychometrics which arose from a different tradition). Luce and Suppes stand on his shoulders.

I agree with walter that the practical worth of axiomatic theory is overvalued, especially the in recent literature on measurement. Its importance is almost exclusively theoretical.

Best

Denny

Stephen Humphry

unread,

Apr 17, 2013, 2:15:13 AM4/17/13

to talking-m...@googlegroups.com

Hi Wal,

No you’re not the heretic. I know of varying degrees of respect for Axiomatic Conjoint Measurement among group members.

I challenge the foundations of the “foundations”—i.e. the axioms. I reject them as axioms of measurement, and I maintain that the entire approach is ill-guided. It rests upon a false distinction between “mathematical objects and properties” and “empirical objects and properties/relations” on the other. I can appreciate the formal work as formal work, but think it has at most modest relevance to measurement. Many things would need to be (or have been) fleshed out much more carefully for me to be convinced of more. The Foundations is supposedly general and therefore applicable to physical measurement. However, physical measurement is invariably based on substantive physical theory, definition and law. SI units are defined in this way; the design principles of instruments are based on substantive theory. There is no separate measurement theory.

It is well and good to claim, as Luce and others did, that a different approach is required in the social sciences. However, there is no acknowledgement (to my knowledge at least) that the relevance of the Foundations to physics has not been tested at all. The references to physical examples are, IMHO, cherry-picked based on consistency with the approach. The examples are chosen where there is some obvious ‘fit’ with the axioms and there is most certainly no attempt to explain how the measurement theory would be applied to measurement instruments and procedures in physics in a more general manner. To be fair, they explicitly note “we seem to have failed on two scores to measure momentum in the same way as the usual mv formulae …” (Krantz et al, 1971, p. 267).

I hasten to add that I do not see this as all the fault of representational theorists; the confusion on basic points in metrology is sometimes astonishing. Natural outcomes are much more of a check against problems than clarity of thought within the BIPM and its many committees.

All of this aside, though, as I say I reject the foundations of the ‘foundations’ and I reject that it is a general theory that applies to physics and also to the social sciences. The axioms may have limited material relevance to applied measurement, and so the theorems may have limited relevance. However, some of the statements made by the group are bizarre, such as that it is a pure convention that (for example) a = 1 in p^a = m^a x v^a , where a > 0 (momentum is mass ‘by’ velecoity). The trouble seems to be that they take they take the division and multiplication of quantities to be literal when they are not (as I hope to have explained in my recent paper here http://www.frontiersin.org/Quantitative_Psychology_and_Measurement/10.3389/fpsyg.2013.00113/full ). When we write m/s or ms^-1 it is merely a shorthand for metres travelled per second of time elapsed. When we write a = f/m the nature of the shorthand is a little more involved.

No, you’re no heretic here, even though some will disagree. I don’t think I was aware of your article in Psych Bulletin. I will check it out as soon as I get a chance.

Regards, Steve

Stephen Humphry | Associate Professor

Graduate School of Education
The University of Western Australia
M428, 35 Stirling Highway, Crawley, WA, 6009
Telephone: +61 8 6488 7008
Fax: +61 8 6488 1052

www.gse.uwa.edu.au
CRICOS Code: 00126G

akyn...@gmail.com

unread,

Apr 17, 2013, 4:24:37 AM4/17/13

to talking-m...@googlegroups.com

Hey all,

D. "I agree with walter that the practical worth of axiomatic theory is overvalued, especially the in recent literature on measurement. Its importance is almost exclusively theoretical."

W. " I argued that fulfilling the requirements of representational measurement theory did not necessarily produce a procedure of any use in the development of science"

The development of psychological theories of risky and uncertain choice means the above statements need to be qualified somewhat. Conjoint measurement served as the formal proof to Daniel Kahneman & Amos Tversky' (1979) prospect theory (it's in the appendix).

What did this paper achieve?

1. It's the highest cited article of all time for Econometrica;

2. It founded the field of behavioural economics;

3. It led to Daniel Kahneman sharing the 2002 Nobel Economics Memorial Prize, the only psychologist to receive this accolade.

Whilst utility theory did not progress on the testing of conjoint measurement cancellation axioms, it's too extreme to say that the axiomatic approach to quantitative psychology has produced nothing of worth, as Dutch thinkers are too fond of saying, like Sijtsma (2012).

By contrast, the most downloaded paper for Psychometrika seems to be Cronbach's (1951) alpha, judging by website visits. So can we imply from this that there's been no idea in the past 60 years of psychometrics as important as coefficient alpha?

Moreover, cumulative prospect theory (Tversky & Kahneman, 1992) and rank-sign dependent utility are also "conjoint measurement" theories of utility (Luce, 1998).

Oh yeah, what real world choice behaviour has prospect theory predicted? Plenty. For example, the relatively low incidence of tax evasion (Bernasconi, 1998), insurance policy choice (Kunreuther, et al, 1993), the behaviour of options traders (Fox, Rogers and Tversky, 1995) and even the daily labour supply decisions the New York City cab drivers make (Camerer, et al, 1997).

So much for the axiomatic approach, eh?

Andrew

On Wednesday, April 17, 2013 3:27:10 PM UTC+10, Denny Borsboom wrote:

Hi Dan,
I think that if you read Stevens' 1946 paper closely, it's clear that he lays the groundwork for the axiomatic theory (not standard psychometrics which arose from a different tradition). Luce and Suppes stand on his shoulders.

I agree with walter that the practical worth of axiomatic theory is overvalued, especially the in recent literature on measurement. Its importance is almost exclusively theoretical.

Best
Denny

On Wednesday, April 17, 2013, wrote:

I suppose that I am the heretic in this group of measurement specialists, but I have much less respect for the opinions of Luce, Suppes, and such. In my 1991 Psych Bull article I argued that fulfilling the requirements of representational measurement theory did not necessarily produce a procedure of any use in the development of science. (Schwager, K.W. (1991) “The Representational Theory of Measurement: An Assessment” , Psychological Bulletin 110: 618-626.) [One peer reviewer thought the draft was not worth publishing, the other thought it was the best article he had ever read.] IMHO any internal criteria for measurement instruments cannot establish the theoretical fruitfulness or practical worth of that measurement procedure - that is why I am much more in favour of validation approaches in their various forms.

Dan seems to suggest that the definition of a construct is primordial - my opinion is that only constructs are allowed that make theoretical or practical sense. And practical concepts may lead to improvised measurement procedures that do not fit in the SI framework. I have elaborated my views in my Erasmus U dissertation, Theories of Measurement in Social Science, and in articles such as Ontological and Epistemological Presuppositions of Social Theory, Bulletin de Méthodologie Sociologique September 1991 32: 54-80.

(At the moment, however, I am using my research skills to make money in the stock market.)

Date: Tue, 16 Apr 2013 16:04:25 -0700
From: dbuto...@gmail.com

To: talking-measurement@googlegroups.com

Subject: Re: [talking-measurement] terminology

Caroline,

SS Stevens is largely responsible (**ahem** to blame) for the operational use of measurement as "applying numbers to things according to a rule". The problem is that everyone can measure everything with their own personal yard stick by this definition and we'd all be right as long as we explain the rule we used. Problem is that this is done poorly if at all. The reason we have so many psychological and other social science/educational tests is that no one can agree on basic rules or definitions of constructs. Want to create your own "measure" all you have to do change the definition of the construct, even just a little bit, and then create a measure for your new construct. After Stevens' definition took off there was an explosion of testing design in the social sciences, which has created a mess that no one really wants to look at or fix in any fundamental way. If you take away Stevens' rule then the house of cards falls. As Denny cited, there are some great thinkers who have made real contributions to measurement theory in the social sciences, Luce being one of my favorites as well as Suppes.

The bottom line is that measurement outside of the physical sciences can mean a lot of different things and must be critically examined before drawing any conclusions.

Thanks,

Dan

On Monday, April 15, 2013 10:43:31 AM UTC-7, caroline long wrote:

Hi there,

I wonder what this group thinks about the following issues.

I come across a very loose use of the word "measure" in assessment literature. For example someone may write about a test as a "measuring instrument". My view is that an instrument has to earn the right to be called a measurement instrument and would prefer to retain that term to an instrument or a process which is approximating measurement to some degree.

What are your views on this?

I would also like to know where the term fundamental measurement was first used? Ok, I think I have my answer from Rasch.org. an article by Wright.

The statement that "fundamental measurement is not a physical operation, but a theoretical property" from , Duncan Luce and John Tukey (1964) liberates the concept from its physical origins. It is another term though that can get used loosely, thereby losing its essence. Any comments?
Caroline

Caroline Long (PhD)
Centre for Evaluation and Assessment (CEA)
Science, Mathematics and Technology Education (SMTE)
University of Pretoria
Phone (027) 012 420 5702 or 012 420 4175
Fax (027) 012 420 5723
email caroli...@up.ac.za

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measurement+unsub...@googlegroups.com.
To post to this group, send email to talking-measurement@googlegroups.com.

Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measurement+unsub...@googlegroups.com.
To post to this group, send email to talking-measurement@googlegroups.com.

Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Stephen Humphry

unread,

Apr 17, 2013, 4:40:07 AM4/17/13

to talking-m...@googlegroups.com

Hi Andrew,

I can’t tell you what a surprise it is that you invoked prospect and utility theory :)

The three ‘achievements’ you list are to me notable for their complete lack of substantive empirical consequence; and one cannot prove a theory with empirical import. Yes, purely formal work receives accolades.

So what?

They’re also notable for having nothing to do with measurement and/or the measurement of quantities in well-defined units.

Of course I agree that psychometrics can claim no more (or less).

Steve

Stephen Humphry | Associate Professor

Graduate School of Education
The University of Western Australia
M428, 35 Stirling Highway, Crawley, WA, 6009
Telephone: +61 8 6488 7008
Fax: +61 8 6488 1052

www.gse.uwa.edu.au
CRICOS Code: 00126G

From: talking-m...@googlegroups.com [mailto:talking-m...@googlegroups.com] On Behalf Of akyn...@gmail.com
Sent: Wednesday, 17 April 2013 4:25 PM
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] terminology

Hey all,

To: talking-m...@googlegroups.com

Subject: Re: [talking-measurement] terminology

Caroline,

SS Stevens is largely responsible (**ahem** to blame) for the operational use of measurement as "applying numbers to things according to a rule". The problem is that everyone can measure everything with their own personal yard stick by this definition and we'd all be right as long as we explain the rule we used. Problem is that this is done poorly if at all. The reason we have so many psychological and other social science/educational tests is that no one can agree on basic rules or definitions of constructs. Want to create your own "measure" all you have to do change the definition of the construct, even just a little bit, and then create a measure for your new construct. After Stevens' definition took off there was an explosion of testing design in the social sciences, which has created a mess that no one really wants to look at or fix in any fundamental way. If you take away Stevens' rule then the house of cards falls. As Denny cited, there are some great thinkers who have made real contributions to measurement theory in the social sciences, Luce being one of my favorites as well as Suppes.

The bottom line is that measurement outside of the physical sciences can mean a lot of different things and must be critically examined before drawing any conclusions.

Thanks,

Dan

On Monday, April 15, 2013 10:43:31 AM UTC-7, caroline long wrote:

Hi there,

I wonder what this group thinks about the following issues.

I come across a very loose use of the word "measure" in assessment literature. For example someone may write about a test as a "measuring instrument". My view is that an instrument has to earn the right to be called a measurement instrument and would prefer to retain that term to an instrument or a process which is approximating measurement to some degree.

What are your views on this?

I would also like to know where the term fundamental measurement was first used? Ok, I think I have my answer from Rasch.org. an article by Wright.
The statement that "fundamental measurement is not a physical operation, but a theoretical property" from , Duncan Luce and John Tukey (1964) liberates the concept from its physical origins. It is another term though that can get used loosely, thereby losing its essence. Any comments?
Caroline

Caroline Long (PhD)
Centre for Evaluation and Assessment (CEA)
Science, Mathematics and Technology Education (SMTE)
University of Pretoria
Phone (027) 012 420 5702 or 012 420 4175
Fax (027) 012 420 5723
email caroli...@up.ac.za

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.
To post to this group, send email to talking-m...@googlegroups.com.

Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.
To post to this group, send email to talking-m...@googlegroups.com.

Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
Denny Borsboom
Department of Psychology
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands
+31 20 525 6882
d.bor...@uva.nl
http://sites.google.com/site/borsboomdenny/dennyborsboom

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.
To post to this group, send email to talking-m...@googlegroups.com.

akyn...@gmail.com

unread,

Apr 17, 2013, 7:38:59 AM4/17/13

to talking-m...@googlegroups.com

Hi Steve,

The substantive empirical consequence of the development of prospect theory was the prediction of the Allais Paradoxes (Allais, 1953), more commonly referred to now as the "common consequence" and "common ratio" effects. It could also predict both risk avoiding and risk seeking choice behaviours. These are all things that expected utility theory simply could not do. That is the "So what?".

Yeah, Kahneman & Tversky (1979) and Tversky & Kahneman (1992) do not define measurement of utility in units; and yes that is a weakness of their work. But I do not believe that it undermines what these papers achieved. As I have communicated to you, I think it is pretty straightfoward to adopt the use of the quantity calculus to utility. One could count units of money and establish putative units of utility, but this needs more work to explore.

As for utility ignoring measurement, I would say that von Neumann & Morgenstern (1944) are largely responsible, as they mischaracterised a physical quantity as being "just a number". Interestingly enough, earlier economists such as William Stanley Jevons were very interested in Maxwell's (1878) Treatise and measurement, but all this was lost by the time Paul Samuelson, Milton Freedman and Jon von Neumann began writing. This is perhaps something Joel Michell could explore.

With my post, I wanted to let our Dutch friends know that there do exist psychological theories of risky choice that have had a tremendous influence over the study of a whole social scientific field, and that such theories have formally employed conjoint measurement. Furthermore, such theories, in my opinion, have been far more successful in describing actual human behaviour than what psychometric models have been...

...and that these utility theories simply have no need for psychometric shibboleths like validity, reliability and your favourite, "dimensionality"...

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measurement+unsub...@googlegroups.com.

To post to this group, send email to talking-m...@googlegroups.com.
Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measurement+unsub...@googlegroups.com.

To post to this group, send email to talking-m...@googlegroups.com.
Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
Denny Borsboom
Department of Psychology
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands
+31 20 525 6882
d.bor...@uva.nl
http://sites.google.com/site/borsboomdenny/dennyborsboom

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measurement+unsub...@googlegroups.com.

Denny Borsboom

unread,

Apr 18, 2013, 2:37:35 AM4/18/13

to talking-m...@googlegroups.com

Hi Andrew

I didn't say axiomatic theory was worthless, but that its practical use is highly limited. With this I intended to indicate its use in constructing, analyzing, and validating measurement procedures. Much has been made of the hypothesized use of the theory to bring psychological measures on par with the beloved physics examples, as you well know, but the number of cases in which this feat has been uncontroversially achieved still equals zero. Naturally, the theoretical worth of the theory is enormous. Your example is a case in point, the role of the theory in philosophy of science is another. But as far as I can see, it hasn't been very useful in addressing measurement problems in either case.

It's interesting that one so skeptical as you has no problem measuring the importance of papers through their number of citations. For what it's worth, I do think Cronbach's paper (certainly the Guttman result it is based on) is important, and I think this is rather obvious so I'll leave it to you to figure out why. Of course, economics is a prime example of a field where quantitative theories of human behavior have flourished, although I would't count economists' handling of measurement problems as particularly impressive. Neither would I consider economics a huge empirical success. Whether a field does or does not have a Nobel prize seems a historical accident to me. But had there been one for psychology, there's a good chance Cronbach would have received it.

Your argumentation for dismissing psychometric concepts like validity and reliability is so impressive that I am unable to think of any counterarguments, so I'll just rest my case.

Cordially

D

--
You recei

akyn...@gmail.com

unread,

Apr 24, 2013, 4:17:39 AM4/24/13

to talking-m...@googlegroups.com

Hey Denny,

Psychological theories of choice under risk (utility) simply have no need for validity or reliability. That may be difficult to accept, but it is nonetheless a reality. By all means, do not take my word for it. Read Tversky & Kahneman's (1992) paper, or even better, Birnbaum's (2008) highly critical paper in Psych Review.

As for economists and measurement, the former have been just as confused about the latter as have psychologists. Most wrongly believe that a measurement is just a number.

Economists are also quite hostile to psychological theories of utility. They prefer to engage in the total fantasy that human beings are strictly rational decision makers under conditions of risk and uncertainty. Much of the success that psychologists have had in utility has been achieved despite the fierce resistance of economists.

Yes, I am skeptical and even more so now that I work with high stakes, curriculum based assessments that are used for matriculating to university. Much of modern psychometrics is more suited to low stakes testing where you are able to do pilot testing, unless you want to conduct psychometric autopsies. I know of colleagues who create very sound high stakes assessments, particularly in mathematics, but know next to nothing of psychometrics.

Cheers,

Andrew

Denny Borsboom

unread,

Apr 24, 2013, 6:29:36 AM4/24/13

to talking-m...@googlegroups.com

Hi Andrew

I think the 2008 Birnbaum paper is very interesting, thanks for alerting me to it. However I don't see why it would obviate the need for concepts that correspond to the central questions of validity (is my instrument picking up the intended source of variation) and reliability (how much random noise is superposed on the signal). Maybe I have the wrong paper (I am looking at New Paradoxes of Risky Decision Making) or maybe I have missed the argument that you intended.

I am also wondering what exactly is your intended reading of the claim that validity and reliability are unnecessary in some fields. Do you mean that the concepts aren't needed (e.g., because people have a different terminonology or conceptual framework to express the central concerns) or do you mean that the relevant questions don't arise? Although I don't know of systematic research in this area, I would think the latter interpretation is hard to defend. It is in my experience generally possible to identify cases where the questions of validity and reliability are addressed in some form, whatever field you are looking at. Validity issues arise whenever the question is raised of whether a measurement instrument is in fact sensitive to the intended attribute. For instance, in physics a historically important issue involves the incorrect interpretation of weight differences before vs after burning as a measure of the amount of phlogiston a body contains. In neuroscience an important validity issue is the extent to which BOLD signals used in fMRI indeed depend on neuronal activity (and in what precise way). This type of issue often arises, even though it does not seem that the name "validity" is uniformly used to denote it across the sciences. I assume it is needless to say that the assessment of random noise is a daily activity in most branches of science, even though the way the issue is approached and assessed can differ widely. I don't really care whether people discuss this under the header of reliability, measurement precision, information, or SDT. The question that is addressed is the same (note that this does not presume that the issue is addressed equally satisfactorily across fields - obviously the assumptions under which the psychometric approach works are strong and if they're not met the relevant techniques don't address random error but, e.g., just say what the average correlation between different items is).

Do we have some ground here or did your criticism run deeper?

Best

Denny

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.

To post to this group, send email to talking-m...@googlegroups.com.

Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Steve Humphry

unread,

Apr 25, 2013, 12:32:27 AM4/25/13

to talking-m...@googlegroups.com, talking-m...@googlegroups.com

In physics, there is no validity theory just as there is no measurement theory. Valid measurement is based on valid substantive theory and law. The design principles of instruments are invariably connected to theory and law. The principles of validity you've spelled out apply but there's no branch in metrology specifically concerned with validity.

Sent from my iPhone

Denny Borsboom

unread,

Apr 25, 2013, 2:48:02 AM4/25/13

to talking-m...@googlegroups.com

Hi Steve

Interesting and important observation. I'd agree but would note that there isn't a branch of validity theory in psychometrics either (unless you'd want to call me a branch ;-) E.g. Psychometrika has never featured a paper on validity, as far as I know. The few people on the planet who might agree to being called validity theorists are almost all educational testers.

I think you're largely right about metrology and physics, but I would judge the situation somewhat differently. There are general treatments of validity as it arises in physics; however these aren't called metrology but philosophy of science. Percy Bridgman (who invented operationalism) and Norman Campbell (who worked out fundamental measurement theory) were both physicists and proposed important solutions to the validity problem. Also, some of the main turns in the philosophy of science turn on validity issues: for instance problems that we would ordinarily consider validity problems in psychology give rise to discussions on the theory-observation relation (e.g., see Carnap), the problem of theoretical terms (see Stegmuller), and even the idea of a scientific revolution (e.g., see Kuhn's discussion on energy pre- and post Planck).

So the difference is mainly that instead of having a couple of wacko psychometricians and educational testers handle the issue of validity, the queen of science has an entire branch of philosophy for herself.

Best

Denny

akyn...@gmail.com

unread,

Apr 29, 2013, 6:32:21 PM4/29/13

to talking-m...@googlegroups.com

Hello Denny,

Yes, you have the right paper there.

The primary motivation that a utility theorist has in proposing a new theory of risky choice is the description and prediction of new choice problems or paradoxes. The most famous of these historically are the St Petersburg (Bernoulli, 1738) and Allais (1953) Paradoxes. The former paradox could not be explained by the mathematical concept of expected value, hence Bernoulli's (1738) proposal of what became known as expected utility. In turn, the latter paradox violated expected utility (expected utility cannot predict the Allais Paradox with any choice of parameters or utility functions). Hence Kahneman & Tversky's (1979) prospect theory, which is a generalisation of expected utility theory.

It seems to me (and I may be dumb) that this enterprise has nothing to do at all with what psychometricians refer to as validity, or trying to answer the question "Does a psychometric test measure what it is supposed to measure"? Nor does it have anything to do with test reliability. That was the point I was making. If there is a more general or nebulous concept of validity to which you feel the above might belong to, then I guess it doesn't really do any harm to label it as such. I just know for a fact that utility theorists don't sit back and think "now how do I demonstrate reliability and validity?". They just don't and they have been successful in not doing so.

I can attest personally to this because of the theory of utility I have been working on for two years now. My primary aim was to see if I could generalise cumulative prospect theory to account for failures of coalescing and Birnbaum (1997) induced violation of stochastic dominance. These are two choice behaviours which CPT cannot account for with any choice of utility or probability weighting functions and any set of parameters. As Tony Marley pointed out to me, however, what I might have really generalised was Rank Weighted Utility. My generalisation seems to work quite well in predicting the results of previous studies, it also generalises Birnbaum's TAX model, and Birnbaum himself has been really quite encouraging with his comments. I hope to write it up with Tony sometime this year. We'll see how it goes.

But at no time did I stop myself and ask "how can I demonstrate the reliability and validity of my new theory?". Indeed, nothing from psychometrics helped me at all. That may be my fault, but I don't believe so.

Now, with "reliability" are you basically referring to how much response error there is in psychological data? If I remember correctly Birnbaum (2008) briefly mentions his work on testing to see just how much violation of stochastic dominance is due to response error. He found that very few choices can be considered as genuine mistakes. There has also been a lot of debate on how to stochastically test the transitivity of choices. But again, in this work there is no mention made of what I believe psychometricians would call "reliability".

I guess if a utility theorist were to consider "validity", then he or she would ask the question "Does this theory account for all known choice behaviours/paradoxes and the new choice behaviours observed?", or something like that.

Andrew

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measurement+unsub...@googlegroups.com.

To post to this group, send email to talking-m...@googlegroups.com.

Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

Paul Barrett

unread,

Apr 29, 2013, 11:46:16 PM4/29/13

to talking-m...@googlegroups.com

Andrew, your statement interested me ..

“But at no time did I stop myself and ask "how can I demonstrate the reliability and validity of my new theory?".

I agree, I have never heard anyone ask “how reliable is that theory“. It seems an incongruous question. Reliability is concerned with repeatability, no more, no less. What’s repeated may span from engineering fabrication of nuts and bolts, whether your car starts every morning, whether your simple 0-bit reaction time will be the same every day, through to whether the sun will rise and set tomorrow.

What affects reliability may be random and non-random influences.

But, asking whether a theory is valid or not would seem to be sensible and common across all fields of endeavour. For any theory we might ask: “does your theory explain/predict that which you claim it should explain and predict?“. If it does we adjudge it valid. If it only explains/predicts some instances of what it claims to explain, then we would question the theory as a valid explanation.

When it comes to reliability and validity of a measurement, reliability is no more than repeatability.

The validity of measurement of something, however, would seem to require establishing that the rules by which you construct your measurement are consistent with how objects/people may be said to contain/embody varying amounts of that ‘something‘. I think this is compatible with the definition of validity proposed by Denny and colleagues in: Borsboom, D., Mellenbergh, G.J., & Van Heerden, J. (2004) The concept of validity. Psychological Review, 111, 4, 1061-1071.

But, when put like that, I can see why no utlity theorist would ever talk about validity in that way!

Instead I think they might ask “is theory X more valid than theory Y?“. For example Andrew, would it be sensible to ask whether Rational Utility Theory is a valid theory of human choice behavior?

But maybe we would instead ask “is Rational Utility Theory an accurate theory of human choice behavior?“

Interesting.

Regards .. Paul

Chief Research Scientist

Cognadev.com

__________________________________________________________________________________

W: www.cognadev.com

W: www.pbarrett.net

E: pa...@pbarrett.net

M: +64-(0)21-415625

From: talking-m...@googlegroups.com [mailto:talking-m...@googlegroups.com] On Behalf Of akyn...@gmail.com
Sent: Tuesday, April 30, 2013 10:32 AM
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] terminology

Hello Denny,

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.

To post to this group, send email to talking-m...@googlegroups.com.
Visit this group at http://groups.google.com/group/talking-measurement?hl=en.
For more options, visit https://groups.google.com/groups/opt_out.

--
Denny Borsboom
Department of Psychology
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands
+31 20 525 6882
d.bor...@uva.nl
http://sites.google.com/site/borsboomdenny/dennyborsboom

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.

To unsubscribe from this group and stop receiving emails from it, send an email to talking-measure...@googlegroups.com.

Denny Borsboom

unread,

Apr 30, 2013, 3:49:57 AM4/30/13

to talking-m...@googlegroups.com

Hi Andrew

I can see that you don't encounter validity or reliability issues if you don't engage in empirical assessment. However, in the Birnbaum paper empirical assessments do figure, and they are interpreted in terms of constructs like risk aversion. As soon as an individual's positions on such variables are empirically assessed, the question arises what the quality of the assessment is. So you would seem to get questions like "do equivalent choice problems yield equivalent reponses ceteris paribus?" (reliability) and "can the observed response patterns indeed be interpreted in terms of risk aversion or are they caused by some other stimulus dimension?" (validity). How does the utility program succeed in sidestepping these questions?

best

d

dennyborsboom | borsboomdenny

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.
To unsubscribe from this group and stop receiving emails from it, send

akyn...@gmail.com

unread,

May 4, 2013, 5:05:53 AM5/4/13

to talking-m...@googlegroups.com

Hi Paul,

Expected (rational) utility theory fails to predict a lot of human choice behaviour under risk, with any choice of utility function or parameters, so it does not accurately describe the psychology of risk. To take just one example, EUT predicts that humans are always risk averse. This, however, is simply not true. If faced with a choice between a sure loss and a merely probable, but greater loss, most people choose the latter. Hence people often seek risk if faced with sure losses, which may explain why some gamblers "chase" losses (i.e., continue to gamble after sustaining a series of losses - if they give up they face a sure loss).

Hi Denny,

Not sure what you mean by "empirical assessments", but given you mention the Birnbaum paper I'll take it to mean modal choices in choice problems. Utility theorists don't talk of "constructs" either - just subjective worth (utility).
How risk aversion is explained or described varies between theories. EUT accounts for risk aversion only via the utility function. Theories such as CPT and Birnbaum's TAX account for risk aversion through decision weights. In CPT, these weights are distorted outcome cumulative probabilities. In TAX and the original version of prospect theory (OPT), the weights are distorted individual outcome probabilities. The weighting function used in OPT, however, makes predictions of violation of stochastic dominance which are not observed, hence the "dominance heuristic" which Kahneman & Tversky tacked onto it. The weighting function used in TAX does not.

Andrew

Reply all

Reply to author

Forward