Maraun on Measurement

Guenter Trendler

unread,

Oct 17, 2010, 7:12:37 AM10/17/10

to Talking Measurement

Hi all,
I have never understood the linguistic analysis that much of Maraun's
work presumes. I am somewhat heartened by the fact that others too
apparently find it difficult to comprehend. However it would be much
more interesting if somebody could in fact explain the position in
terms that ordinary non-Wittgensteinians could also understand.
Anyone?
Best
Denny

Hi Denny,

I also find it quite difficult to resume Maraun’s arguments, since I
would have to critically comment them almost sentence for sentence.
But let me give it a try. Leaving critical comments largely aside I
will focus only on the last section of his article “Constraints on
Measurement in Psychology” (pp. 452 - 458) and arrange some quotes in
such a way that his argumentation is smooth and easy to follow.

Maraun differentiates as follows: “A technical concept is a concept
defined by a specialized or expert community, and employed within a
narrow, technical field of application. A common-or-garden concept, on
the other hand, is a concept with a common employment in everyday life
(Baker & Hacker, 1982).” (p. 453)

He first focuses on the latter.

He believes that “in the majority of cases, however, the conceptual
bedrock of psychology is made of common-or garden concepts. The reason
for this has much to do with the aims of the discipline. Psychology
arose from a need to understand the very same phenomena that are of
interest to authors, poets and the person on the street, phenomena
denoted by common-or-garden concepts. Depression, dominance,
intelligence, happiness, fear, motivation, personality, memory, and so
on, are of interest to the psychologist, as well as to Tolstoy and
Dickens.” (p. 454)

He argues: “Now, the claim that common-or-garden psychological
concepts are not measurable follows from the simple observation that
common-or-garden psychological concepts as they stand are not embedded
in normative practices of measurement. This is a simple observation,
because it rests on whether there exist normative, rule-governed
techniques for taking measurements of dominance, intelligence,
creativity, tension, and so on.” (p. 455) That is, “a practice of
measurement implies normative measurement techniques, instruments and
units of measurement” (p. 456).

Hence, he concludes, “when it is claimed by a psychologist that a
common-or-garden psychological concept has been measured, the concept
is being misused. (p. 456)”

Comment: This is partly just about the same argument as Michell’s
assertion that Steven’s misuses the concept of measurement. The
meaning of “measurement” is not “the assignment of numerals to objects
or events according to rules”, as Stevens believes, but measurement
means “the estimation of the ratio of one magnitude of a quantity to
another magnitude (the unit) of the same quantity”. Hence, if a
psychologist claims that (common-or-garden) psychological concepts are
measurable he is wrong, because the correct use of “measurement” is
not yet established in psychology. Or again in Maraun’s words: “this
is a simple observation”, because there exists no “normative, rule-
governed techniques for taking measurements of dominance,
intelligence, creativity, tension, and so on”. At the same time,
however, he also argues that in general “common-or-garden
psychological concepts are not measurable”. This obviously does not
follow from the “simple observation” of the pathological measurement
practice in psychology.

Secondly, Maraun concedes that common-or-garden concepts may be
transformed into technical concepts and thereby make them amenable to
measurement. “Is it not possible that innovations will occur so that
at some future point in time it will be possible to measure
intelligence, dominance, leadership, and so on? Might it not be the
case that this kind of refinement and innovation is precisely the
currency of modem psychological measurement practice? Certainly,
Wittgenstein himself recognized that refinement and innovation are
fundamental to measurement.” (p. 457) “Cannot a psychologist go ahead
and employ a term such as depression in a new technical sense? Of
course there exist no laws governing the use of concepts by
psychologists: The psychologist may do as he or she wishes.” (p. 458)

But Maraun is sceptical about this possibility. He states: “While
allowing for the logical possibility of innovation, features inherent
to psychological concepts nevertheless seem to make this possibility
an unlikely one. What are the grounds for so pessimistic a conclusion?
Simply put, measurement requires a formalization which does not seem
well suited to what Wittgenstein calls the 'messy' grammars of
psychological concepts, grammars that evolved in an organic fashion
through the 'grafting of language onto natural ("animal")
behaviour' (Baker & Hacker, 1982).”(p. 457)

He believes that the messy meaning of common-or-garden psychological
concepts cannot be reduced to the precise meaning of technical
concepts and illustrates his doubts as follows: “Take, for example,
the concept dominance. Given the appropriate background conditions,
practically any 'raw' behaviour could instantiate the concept. Hence,
Joe's standing with his back to Sue could, in certain instances, be
correctly conceptualized as a dominant action. On the other hand,
Bob's ordering of someone to get off the phone is not a dominant
action if closer scrutiny reveals the motivation for his behaviour to
be a medical emergency which necessitated an immediate call for an
ambulance. The possibility for the broadening of background conditions
to defeat the application of a psychological concept is known as the
defeasibility of criteria (Baker & Hacker, 1982).” (p. 457)

Comment: This, I believe, is a crucial point. What exactly do
psychological concepts refer to? The messy meaning reflects a messy
reality, just as does the messy meaning of physical common-or-garden
concepts. That is in my view the reason why, just as common-or-garden
physical concepts, common-or-garden psychological concepts are not
amenable to measurement Hence, we must ask, is it possible to restrict
the meaning of psychological concepts to such an extent that they are
suited for the empirical testing of axioms of measurement, i.e. that
they refer to just one homogenous attribute we want to measure? Or
does "the defeasibility of criteria", and similar objections, apply?

Best
Guenter

Tom Bramley

unread,

Oct 25, 2010, 5:09:55 AM10/25/10

to talking-m...@googlegroups.com

Dear Guenter,

Thank you for taking the time and trouble to present Maraun's position in more detail. In an earlier post, you had said:

GT: "However, to return to the Rasch model: in order to test it we will have to identify the material substrate of ability. The only workable route is in my view to find out if certain items always require the same ability by observing if always the same brain region is active. If this is the case, we can assume to have identified the material substrate. Then we will have to find a way to identify equal levels of ability (e.g. volume of brain mass active in that region). We will have to verify if the same item always is associative with the same level of ability per person, etc."

I've been thinking about this in the light of the Maraun article (and book), and was wondering whether 'running ability' might be a better example to consider, because scientific knowledge about the 'material substrate' of running ability is probably more advanced than with mental abilities. For example 'sprinting ability' is thought to depend (amongst other things) on the proportion of 'fast twitch' and 'slow twitch' muscle. But I take Maraun's point to be that if we were to define 'sprinting ability' in terms of fast twitch muscle mass, we might make it measurable, but we would be changing its meaning. High or low sprinting ability is correctly ascribed to people on the basis of their sprinting performances. However sophisticated our measures of other relevant factors (e.g. body shape, blood oxygen-carrying, "motivation", reaction time, running technique) they will not be measures of sprinting ability, but of the causes of sprinting ability.

Regards,
Tom.

Hi Denny,

Best
Guenter

--
You received this message because you are subscribed to the Google Groups "Talking Measurement" group.
To post to this group, send email to talking-m...@googlegroups.com.
To unsubscribe from this group, send email to talking-measure...@googlegroups.com.
For more options, visit this group at http://groups.google.com/group/talking-measurement?hl=en.

Click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg== to report this email as spam.

If you are not the intended recipient, employee or agent responsible for delivering the message to the intended recipient, you are hereby notified that any dissemination or copying of this communication and its attachments is strictly prohibited.
If you have received this communication and its attachments in error, please return the original message and attachments to the sender using the reply facility on e-mail.
Internet communications are not secure and therefore Cambridge Assessment (the brand name for the University of Cambridge Local Examinations Syndicate, the constituent elements of which are CIE, ESOL and OCR [Oxford Cambridge and RSA Examinations is a Company Limited by Guarantee Registered in England. Registered office: 1 Hills Road, Cambridge CB1 2EU. Company number: 3484466]) does not accept legal responsibility for the contents of this message.
Any views or opinions presented are solely those of the author and do not necessarily represent those of Cambridge Assessment unless otherwise specifically stated.
The information contained in this email may be subject to public disclosure under the Freedom of Information Act 2000. Unless the information is legally exempt from disclosure, the confidentiality of this email and your reply cannot be guaranteed.

Trendler, Guenter

unread,

Oct 25, 2010, 11:25:10 AM10/25/10

to talking-m...@googlegroups.com

Hi Tom,

Well, yeah, for the purpose of critical discussion it may be a better example. It’s funny that I was also thinking about the Rasch model for some time in sports terms; i.e. ‘jumping high ability’ (the higher the rod the high jumper must jump over the more difficult the jumping task).

But maybe it is even more useful to contemplate Maraun’s problem by means of an example from physics, where measurement is already established. For instance, let’s take Hooke’s law which states that F = -kx (1). Think of F in analogy to ‘sprinting ability’ and of x in analogy to ‘sprinting performance’. Now, how can we establish the law which says that the restoring force of the spring F is directly proportional to the displacement of the end of the spring x? It’s no problem to determine equal levels of x by measuring the length displacement of the spring, but how about F? Usually the spring is stretched by means of weights. Hence the same magnitudes or levels of F are identified by the same magnitudes of G (Newton’s third law [2]). Think of m in G = mg as the ‘twitch muscle mass’.

Can you apply Maraun’s objection to this example? If not, why not? Where is the difference?

Can you please point out where exactly Maraun (page? publication?) formulates his objection?

Regards,
Guenter

(1) http://en.wikipedia.org/wiki/Hooke%27s_law

(2) http://en.wikipedia.org/wiki/Newton%27s_laws_of_motion

-----Ursprüngliche Nachricht-----
Von: talking-m...@googlegroups.com im Auftrag von Tom Bramley

winmail.dat

Denny Borsboom

unread,

Oct 25, 2010, 2:18:51 PM10/25/10

to talking-m...@googlegroups.com

Hi Guenter,

thank you for your exposition; it was most helpful. About this:

> Well, yeah, for the purpose of critical discussion it may be a better
> example. It’s funny that I was also thinking about the Rasch model for some
> time in sports terms; i.e. ‘jumping high ability’ (the higher the rod the
> high jumper must jump over the more difficult the jumping task).

The sports analogy is, in my view, quite exact. The Elo rating in
chess works like this, only the other chess players function like
items [actually it's a bradley terry luce model rather than a rasch
model, but this doesn't make much of a difference]. The Elo rating is
spectacularly accurate in predicting the probability of one player
beating another. There are huge databases from online chess games
where you could test this yourself.

Chess is an interesting example because as a dynamic skill, it clearly
depends on lots of qualitative [eg strategy] and quantitative [eg
speed] properties being intertwined in a complicated way. Colleagues
of mine have spent some time trying to isolate specific cognitive
tasks that a chess player must solve to successfully navigate
chess-situations; they then used the mastery of these to predict elo
ratings. They were quite good at this; however in practice it didn't
add much because the individual differences [which is what one is
interested in here] are stunningly unidimensional.

Best
Denny

--
Denny Borsboom
Department of Psychology
University of Amsterdam
Roetersstraat 15
1018 WB Amsterdam
The Netherlands
phone: +31 20 525 6882
email: d.bor...@uva.nl
homepage: http://users.fmg.uva.nl/dborsboom

Tom Bramley

unread,

Oct 26, 2010, 4:23:20 AM10/26/10

to talking-m...@googlegroups.com

Hi Guenter,
About 12 years ago I contacted the British athletics federation to see if they had electronic data on national or international high jump competitions - I'd planned to analyse with the Rasch model to investigate the relationship between bar height and logit 'jumping difficulty'. Unfortunately they didn't have any collated data. Things might be different now, I guess. I was not expecting a linear relationship. I had not read any Michell back then and it had not occurred to me to wonder whether 'jumping ability' was a quantitative attribute (and I'm still not completely convinced it makes sense to do so, which is why I appreciate the discussions on this forum so much!)

In my previous post I wasn't quoting Maraun directly, but the bits of his book I had in mind were the 'sabermetrics' example in Chapter 13 'three case studies' of his book, and the discussion of the Boyle-Charles law in Chapter 10 (pages 3-4) http://www.sfu.ca/~maraun/Mikes%20page-%20Myths%20and%20Confusions.html
I think that where his objection might apply in your example is that 'force' and 'displacement' are concepts with specific meanings, whereas this is less the case for 'sprinting ability' and 'sprinting performance' - but perhaps others would disagree.

Thanks for your comment about the chess Denny - you might be interested in this http://kaggle.com/chess
- a competition to improve on the Elo system. On the 'hints' page there are links to some interesting articles about chess ratings, such as this one http://www.glicko.net/research/acjpaper.pdf . Amongst many other things, this article discusses whether slow play and rapid play games should have separate rating systems, because they might require different abilities - and if so, at what time limit to 'draw the line' separating slow games from rapid games.

Denny Borsboom

unread,

Oct 26, 2010, 10:36:26 AM10/26/10

to talking-m...@googlegroups.com

Hi Tom

yes I know the Glicko system. In my view Glicko stands to Elo as
Birnbaum stands to Rasch. More parameters, better fit, less principled
justification.

Clearly the difficulty of the bars in your running example wouldn't be
the same as their height, given the limits of the human propulsion
system (legs). In fact the function would have to be extremely
nonlinear if at all continuous. Interestingly, if one had to predict
the probability of a randomly chosen runner to pass or fail a bar, one
would always prefer the IRT difficulties to their physical heights.
But if one had to construct a bar such that a runner had, say, a 50%
probability of passing it, one could not do this from the IRT model,
i.e., without knowing the required physical height.

Best
Denny

Trendler, Guenter

unread,

Oct 26, 2010, 10:56:27 AM10/26/10

to talking-m...@googlegroups.com

I believe we must take into account that Maraun’s main target is not the application of (representational) measurement theory in psychology, but psychometrics; more precisely, conceptual issues concerning the symbols involved in latent variable models. Maraun’s work is most certainly a remarkable and indispensable conceptual critique of psychometrics and Wittgenstein’s philosophy may indeed be helpful.

However, in my view he does not do justice to axiomatic measurement theory by lumping it together with psychometrics, and his critique of it, covering just a few lines, seems a bit overhasty to me (Maraun, 1998, p. 451-452). Maybe somewhere else he goes more into detail and I would be happy if it is pointed out to me.

Unfortunately, Maraun (1998) is often awfully ambiguous. For instance, on p. 436 he ascribes to Wittgenstein the following definition of measurement “measurement is a normative, rule-guide practice”. So far, so good, but furthermore he attributes to him the statement that “it is incoherent to claim that whether, and how, something can be measured is a matter of empirical discovery.” (id.) WHETHER and HOW something can be measured is not a matter of empirical discovery? Can he really be serious about this?

In order to avoid misunderstanding this statement and the critique relying on it (e.g. of construct validation theory on p. 445f), which, by the way, I think is valid, we could modify it in the following manner: “If the measurement practice of an attribute is already established, then the questions WHETHER and HOW it is measured are not an empirical question anymore, since they have already been answered in antecedent experiments.” Or in other words, “it is incoherent to claim that the MEANING of what (e.g. length) is measured in a measurement practice (e.g. length measurement practice by means of rods) is a matter of empirical discovery.”

Hence, if someone claims simultaneously that “this scale measures anxiety” and that “we will be able to say what anxiety is when we know all the laws involving it” (Cronbach & Meehl quoted on p. 446), then he stumbles into Maraun’s trap. That is, if someone claims that he can measure anxiety he must already know the meaning of “anxiety”. In an established measurement practice it doesn’t make sense to say that “I measure something, but I don’t know what it is”.

Guenter

Maraun, M. (1998). Measurement as a normative practice: Implications of Wittgenstein's philosophy for psychological measurement. Theory and Psychology.

winmail.dat

Trendler, Guenter

unread,

Oct 26, 2010, 11:28:25 AM10/26/10

to talking-m...@googlegroups.com

Two quotes from Maraun’s "Myths and Confusions" (1) which I urge everybody interested in measurement to seriously take into consideration.

“i. The ordinary language sense of the concept hitting ability has a complex grammar. That is, the rules that fix the correct employments of the concept are not of the simple necessary and sufficient condition type. There are multiple and diffuse criteria for the ascription of the term. The grounds for correctly ascribing the concept to a baseball player might be said to be messy. This grammatical messiness can create a feeling that the concept is ineffable, and that this has something to do with unobservability. However, no concept is ineffable (else, it would not be a concept). If one adopts the technical homynym that takes batting average to be hitting ability, then one side-steps the problem of a complex grammar, but, on the other hand, is restricted to studying a phenomenon that is different than that which was originally of interest.

ii. When sabermetricians succeed in clearing up their equivocation over the concept hitting ability, they will have succeeded in clarifying the phenomena that they must study in a study of the hitting abilities of baseball players. They will, only then, have something to empirically investigate in a study of hitting ability. This something will undoubtedly turn out to be empirically complex. Empirical investigation will, undoubtedly, involve the complex issue of the causes of superior hitting ability.” (p. 3)

“When the sabermetricians eventually settle their conceptual issues and come to speak unambiguously about hitting ability, there will then arise legitimate measurement problems and the potential for refinements in measurement.(…) But such refinements are only possible given a settled conceptual foundation.” (p. 4)

Comment: I just want to emphasize that, according to Maraun clarifying the phenomena and the settlement of conceptual foundations are two sides of the same coin, an idea I’m totally in agreement with.

(1) http://www.sfu.ca/~maraun/Mikes%20page-%20Myths%20and%20Confusions.html

winmail.dat

Denny Borsboom

unread,

Oct 26, 2010, 11:39:59 AM10/26/10

to talking-m...@googlegroups.com

> However, in my view he does not do justice to axiomatic measurement theory by lumping it together with psychometrics.

Interesting line. What exactly would axiomatic theory as applied in
psychology be, if not psychometrics?

Best
Denny

--

Trendler, Guenter

unread,

Oct 26, 2010, 11:49:50 AM10/26/10

to talking-m...@googlegroups.com

Oh, c'mon! You know what I mean!

Best
Denny

--

winmail.dat

Tom Bramley

unread,

Oct 26, 2010, 12:01:35 PM10/26/10

to talking-m...@googlegroups.com

Hi Guenter,
The very last page of chapter 9 (Latent variable interpretation) of Maraun's book mentions representational/axiomatic measurement. He seems to present Michell as a representationalist (!)
A quote:
"It is not, however, suggested that the insights of the axiomatic/representational
approach are not profound, and of great import to the social and behavioural sciences, but only
that application of these insights cannot deliver measurement."
http://www.sfu.ca/~maraun/Mikes%20page-%20Myths%20and%20Confusions.html

Regards,
Tom.

-----Original Message-----
From: talking-m...@googlegroups.com [mailto:talking-m...@googlegroups.com] On Behalf Of Trendler, Guenter
Sent: 26 October 2010 16:50
To: talking-m...@googlegroups.com
Subject: AW: [talking-measurement] Maraun on Measurement

Best
Denny

Denny Borsboom

unread,

Oct 26, 2010, 1:37:50 PM10/26/10

to talking-m...@googlegroups.com

> Oh, c'mon! You know what I mean!

No I don't, actually. The models in ACM yield ways of representing
empirical relations numerically. If these models are used in
psychology that makes them psychometric models. It would be absurd to
suggest that if somebody does ACM, they are not doing psychometrics.
Also, it would represent an unproductive denial of the contributions
of ACM scholars to psychometrics. Just a few reminders; Duncan Luce
has served as president of the Psychometric Society and has written
several important contributions to Psychometrika, the most recent one
in 2005, as have related scholars such as Falmagne and Batchelder;
entire lines of psychometric modeling have been explicitly based on
ACM, such as those presented by Scheiblechner in the '90s; and some
papers in Psychometrika, in my view, rank among the most important
papers in ACM [eg Ellis & Junker, 1997].

It is simply provincial to suggest that if you do ACM then you can't
be doing psychometrics or the other way around. See:

Ellis, J. L. & Junker, B. W. (1997). Tail-measurability in monotone
latent variable models. Psychometrika, 62, 495-523.

Luce, R.D. [2005] Measurement analogies: Comparisons of behavioral and
physical measures. Psychometrika, 70, 227-251.

Falmagne, J.C. (1989). A latent trait theory via stochastic learning
theory for a knowledge space. Psychometrika, 54, 283-303.

Scheiblechner, H. (1999). Additive conjoint isotonic probabilistic
models. Psychometrika, 64, 295-316.

Best
Denny

On Tue, Oct 26, 2010 at 5:49 PM, Trendler, Guenter

Jack Stenner

unread,

Oct 26, 2010, 2:29:46 PM10/26/10

to talking-m...@googlegroups.com

Hello Denny,

Do you think it likely that the chess ability on which I differ from you today is likely the same ability on which I differ from myself 40 years ago? If so, the "stunning unidimensionalty" observed for chess ability may function the same within and between subjects. My hunch is that we can trade off a change in task difficulty (you play me down a queen and a rook) for our difference in ability (you are better than me by one logit) to produce a predictable change in the measurement outcome (this time we draw). Would this not mean that we have a quantitative attribute? All this in the face of the extraordinary complexity that seems to underpin this attribute as viewed from either the person face or the task face. Best Jack

Jack Stenner
Chairman & CEO
MetaMetrics, Inc.
Developer of the Lexile and Quantile Frameworks
1000 Park Forty Plaza Drive, Suite 120
Durham, NC 27713
Tel: 919-547-3402 Fax: 919-547-3401
jste...@Lexile.com
Web: www.MetaMetricsinc.com | www.Lexile.com | www.Quantiles.com

Lexile Search Now Available on Barnes & Noble.com!

Click here for details

Denny Borsboom

unread,

Oct 27, 2010, 4:06:53 AM10/27/10

to talking-m...@googlegroups.com

Hi Jack,

this is the million dollar question. If I knew a good answer I'd write
a paper, or better, a book.

For now it looks to me as if the following two theses are both true.
The challenge is in understanding why.

Thesis 1. At the intra-individual level, progress in skills like chess
is not smoothly continuous, has a complicated topology (i.e., is not
unidimensional), and does not seem causally homogeneous.

1a) Chess skill isn't continuous because if you learn a new move or
trick or strategy, can suddenly do qualitatively new things (e.g., the
available moves change).

1b) Chess skill isn't unidimensional because you can move through the
space of subskills and strategies in various ways (e.g., you can first
learn strategy 1 or strategy 2, there is no prescribed progression).

1c) Chess skill isn't causally homogeneous because it depends
on/consists of lots of sub-abilities (vision, perception,
categorization, memory, speed, etc) that can be present in various
constellations; we have little insight in how this works but it seems
clear that two players can be equally good (e.g., have a probability
of 50% of winning against each other) while they have different
constellations of skills (e.g., player 1 is fast but imprecise while
player 2 is slow but precise etc).

Conclusion: At the intra-individual level it seems to me that an
increase in chess skill isn't like an increase in a quantitative
attribute, i.e., it doesn't look like growth in bodily height for
instance, which is continuous (if you want to get from 1.7 to 1.8
meters, you need to pass all points inbetween), unidimensional (you
can can only go up or down), and homogeneous (there is only one way of
being 1.7 meters tall and that's by having a given extension in
space).

Thesis 2. Individual differences in chess skill, whether measured by
choose-a-move tests or by actual playing, appear to be unidimensional,
(virtually) continuous, and at a minimum have certain quantitative
properties.

2a) The individual differences are unidimensional in the sense that a
single dimension, derived from people's performance on tasks or games,
generally does better than constellations of subdimensions, skills,
'types of players', or other more complicated representations. Here I
take "does better" to mean e.g.: is favored over multidimensional
representations in modeling exercises by indices like AIC, is more
robust to changes in research design or test details, has better
predictive properties, and is amenable to controlled manipulations
like adaptive testing or matching players of the same skill level to
each other.

2b) The individual differences are (virtually) continuous: It appears
possible, for any two players with a reasonable difference in chess
ability, to find another one who has an ability inbetween them. This
is within margins of precision of course, but still quite good.

2c) The individual differences have at least some quantitative
properties: In adaptive testing, you can control the percentage
correct to a very significant degree as you know well. If you have a
good database of chess items, and you have a good estimate of a
person's ability, then you can pick set of items of which he or she
will make, say, X% correct for any percentage X you like. Sometimes
you can even do this by constructing the items from theoretical
principles (as in the Lexile) but even if you can't this is in general
very, very, very accurate for domains like chess. Likewise, if you
call a percentage, say 40%, I can pick you a player A and a player B
such that player A will win 40% of the time based only on their Elo's.
Maybe it will be 42% or 38% but I won't be far off. Your Lexile work
shows comparable precision. You have to really be ultra-skeptical not
to be impressed the level of control that adaptive testing (of which
this is an example) yields; it also appears clear to me that the
control has at least some quantitative properties and it seems like
only a dogmatic person would deny this.

Conclusion: The evidence that chess ability, as an individual
differences variable, is unidimensional, continuous, and has at least
some quantitative properties or implications is significant.

So what do we do? Do we conclude that chess ability is a continuous
quantity like length, measurable by tests and game performance? I
would hesitate, given the force of the objections under (1). Do we
then conclude that chessometrics is a pathological science, claiming
quantitative structure where there is none and deceiving the
chess-playing community, leading them by Stevens' primrose path in
suggesting that Elo ratings are quantitative? Again, I would hesitate,
given the force of the evidence in (2).

Best
Denny

Paul Barrett

unread,

Oct 27, 2010, 5:30:04 AM10/27/10

to talking-m...@googlegroups.com

Denny, what is the preferred or conventional standard unit of chess-playing
ability?

Regards .. Paul

---------------------------
Advanced Projects R&D Ltd.
---------------------------
W: www.pbarrett.net
E: pa...@pbarrett.net
M: +64-(0)21-415625

Denny Borsboom

unread,

Oct 27, 2010, 5:58:37 AM10/27/10

to talking-m...@googlegroups.com

The Elo rating is the standard rating system.
best
denny

Tom Bramley

unread,

Oct 27, 2010, 6:11:24 AM10/27/10

to talking-m...@googlegroups.com

As in the history of traditional weights and measures, there are local systems too. In England the top players will have FIDE Elo ratings (the international standard) but club/county players will just have an ECF rating. There is a table showing approximate conversions here: http://grading.bcfservices.org.uk/help.php#notes

The conversions are linear, like Celsius / Fahrenheit. The absolute zero of chess ability awaits discovery! The ECF ratings can be given a rough interpretation in terms of probability of success, but I have forgotten what it is - I think that a difference of 10 (but it could be 20) rating points corresponds roughly to odds of 3:1.

(As an aside - as befits a 'psychometric backwater' the ECF system is much simpler than Elo but it seems to be reasonably fit for purpose).

Regards,
Tom.

-----Original Message-----
From: talking-m...@googlegroups.com [mailto:talking-m...@googlegroups.com] On Behalf Of Denny Borsboom
Sent: 27 October 2010 10:59
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] Maraun on Measurement

Click https://www.mailcontrol.com/sr/wQw0zmjPoHdJTZGyOCrrhg== to report this email as spam.

Jack Stenner

unread,

Oct 27, 2010, 10:43:22 AM10/27/10

to talking-m...@googlegroups.com

Thanks Denny, beautifully done!

Relative to thesis 1. Individual molecules in a monatomic gas have a totally unpredictable trajectory sometimes gaining energy following collisions and losing energy following collisions with sides of the container(a complicated topology). Most telling for our purposes is that individual molecules don't have a temperature. Temperature is an emergent attribute of the ensemble highly complicated in it's origin yet usefully treated as uni dimensional at the ensemble level. The very real complexity at the molecular level gets averaged over. This is the great insight of Gibb's and independently Einstein.

As to skills, processes and strategies one solution to the lack of smooth continuity is to posit that these are not the "stuff" of chess ability but rather are what you do with chess ability. In reading theory, some argue that strategies like summarizing or retelling make up this complicated ability called "reading". Alternatively we view these strategies as things readers can do well when they are targeted (reader ability and text complexity match) and they do poorly when text is relatively too difficult. Thus the causal action flows from the ability to the strategies and not the reverse. In this view it is not a failing that there is no prescribed progression in strategies either within or between subjects. Conclusion: At the intra-individual it may be that chess skill is quantitative.

As to your excellent question" how do we proceed" I believe it is healthy to begin with a quantity hypothesis precisely because we know how to test the hypothesis. Specifically:(1) build a theory of chess board or next move complexity,(2) use the theory to calibrate say 100 million board layouts,(3) conjointly measure players and empirically calibrate a large sample of boards,(4) regress empirical complexity on theoretical complexity,(5) test the trade off property ( a one logit increase in theoretical complexity can be traded off or off set by a one logit increase in player ability to hold the probability of success constant, (6) confirm the trade off up and down the scale, (7) write the book. Best. Jack

Paul Barrett

unread,

Oct 27, 2010, 3:18:09 PM10/27/10

to talking-m...@googlegroups.com

-----Original Message-----
From: talking-m...@googlegroups.com
[mailto:talking-m...@googlegroups.com] On Behalf Of Denny Borsboom
Sent: Wednesday, 27 October 2010 10:59 p.m.
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] Maraun on Measurement

The Elo rating is the standard rating system.
best
denny

===============================

No Denny, apparently it's not "standard" in the sense I was asking for - a
magnitude quantity possessing a standard unit against which any other
magnitudes may be compared (as say in time, length, mass etc.).

From Wikipedia:
http://en.wikipedia.org/wiki/Elo_rating_system

"Different ratings systems

The phrase "Elo rating" is often used to mean a player's chess rating as
calculated by FIDE. However, this usage is confusing and often misleading,
because Elo's general ideas have been adopted by many different
organizations, including the USCF (before FIDE), the Internet Chess Club
(ICC), Yahoo! Games, and the now-defunct Professional Chess Association
(PCA). Each organization has a unique implementation, and none of them
precisely follows Elo's original suggestions. It would be more accurate to
refer to all of the above ratings as Elo ratings, and none of them as the
Elo rating.

Instead one may refer to the organization granting the rating, e.g. "As of
August 2002, Gregory Kaidanov had a FIDE rating of 2638 and a USCF rating of
2742." It should be noted that the Elo ratings of these various
organizations are not always directly comparable. For example, someone with
a FIDE rating of 2500 will generally have a USCF rating near 2600 and an ICC
rating in the range of 2500 to 3100."

And..
"Because of the significant difference in timing of when inflation and
deflation occur, and in order to combat deflation, most implementations of
Elo ratings have a mechanism for injecting points into the system in order
to maintain relative ratings over time."

And..
"The current system in the United States (Glicko) includes a bonus point
scheme which feeds rating points into the system in order to track improving
players, and different K-values for different players.[19] Some methods,
used in Norway for example, differentiate between juniors and seniors, and
use a larger K factor for the young players, even boosting the rating
progress by 100% for when they score well above their predicted
performance.[21]"

And look at this:
"Elo's central assumption was that the chess performance of each player in
each game is a normally distributed random variable. Although a player might
perform significantly better or worse from one game to the next, Elo assumed
that the mean value of the performances of any given player changes only
slowly over time. Elo thought of a player's true skill as the mean of that
player's performance random variable."

The Elo system is a statistical system, not a deterministic "measurement"
system. Like IRT.

This is not "quantitative measurement" but useful/practical
statistical-aggregate defined measurement. There is no standard unit, no
sense of what 0 means in an ELO system. And apparently no metric
'translation" of units (e.g. cm, inches, Angstroms, meters etc.).

Frankly, when you read through how the math is applied, and the operation of
the systems in chess - it all looks like a variation of numerology, really.
However, it predicts performance to a clearly useful degree and has shown
great utility in other sports.

But it's not quantitative measurement, it's a useful, robust, predictive
rating scheme formed to classify players, tournaments, performance etc.

Likewise the "mathgarden" exercise. Likewise Lexiles.

Psychometricians build statistical models which do pretty damn well in some
settings except when predicting the precise value for any individual using a
model based upon aggregate performance.

If anything has become clear over the decades, it's that you cannot
construct quantitative measurement (or examine causality) using probability
distributions and aggregate-sample statistics. But you can produce useful
"ersatz" indicators which behave with some useful regularity and some degree
of generalizability.

Anyway, the "reality" of Elo system in practise is hardly an exemplar of
"measurement" as any physical scientist might know the meaning of that word.
So, let's get back to "talking measurement" rather than "talking
statistical-model rating schemes".

Denny Borsboom

unread,

Oct 27, 2010, 4:18:39 PM10/27/10

to talking-m...@googlegroups.com

Dear Paul,

I don't recall having said anything that is at odds with your
observations. I don't recall having claimed ratio scale properties for
anything in psychology. I don't recall having claimed that anything
here is on par with what the physicists do. I don't recall having said
that anything here is quantitative measurement. I don't even recall
having claimed that a physical scientist would take this as
measurement or not. Etc.

I was posing a dilemma that, honestly, keeps me busy. But maybe you
don't have dilemmas. After all, over the decades things have become
clear to you.

Best
Denny

Paul Barrett

unread,

Oct 27, 2010, 5:50:14 PM10/27/10

to talking-m...@googlegroups.com

Sigh ...

Denny you said in a previous message.." But my point is stronger: if someone
came up to me and said they can in fact measure - not just test for - chess
ability, I think that person could make a pretty reasonable case, just like
Jack can make a very reasonable case for the Lexile. It's clearly not just
curve fitting or stamp collecting, as there is a control element present
that has a quantitative side to it."

The actual implementation of the Elo, as described on Wikipedia, looks
somewhat like "just curve fitting".

And " After all, over the decades things have become clear to you". Yes,
that is exactly the case - fed largely by Feynman, Michell, Maraun,
Freedman, and Breiman.

I was just looking at Forrest Young's 1996 presentation again just now ...
http://forrest.psych.unc.edu/conferences/PsyMet96OverHeads.pdf

and why he no longer called himself a psychometrician.

I now accept quantitative measurement of psychological attributes will never
work because of the adaptive/biologically self-organizing properties of the
system generating the behavior. You cannot make systematic measurement of a
system which is continuously partially reconstructing itself over time -
within hours if we take hippocampal neurogenesis into account.

So, hello fuzzy-logic, algorithmics, bootstraps, evolved-over-time
simulations, cellular automata, and "good enough" generalizations. Messy,
forever incomplete, but accurate at a gross level of precision (much in the
same way as characterizing a fractal-generated phenomenon).

And that is where we fundamentally differ I suspect, because you appear to
argue that psychometricians have something to add to psychological science
and "measurement". I don't.

Look at what the best psychometricans do these days - analyze exam and
proficiency scores, and new forms of IRT and SEM data models (ideal point,
ESEM etc.). All mostly good for edumetrics and modeling of questionnaire
item responses. But maybe that is what this list is really all about - for
those who mostly work in education and who deal with educational profiency
assessment?

I think there is a brutal simplicity at work here which few want to
acknowledge because it requires that they re-address how to approach
investigation in a "science" of psychology, rather than in a technological
world of analyzing proficiency or questionnaire reponses.

For psychological scientists, Michell said it all 13 years ago. Modern
neuroscience, agent-based AI, machine-learning-algorithmics, complexity and
systems theory, finished the argument for those still listening.

I find it hard to understand what "modern psychometrics" is trying to
achieve within a science of psychology, except if I replace "psychometrics"
with the term "edumetrics". Then it all makes very good sense because
edumetrics specialists are not trying to work as scientists, but as expert
statisticians characterizing/modeling response patterns in a very specific
field of educational and proficiency assessment.

Never mind, the simplicity and new avenues I see for innovation in
understanding/assessing the human are probably fed more by my own stupidity
rather than any insightful intelligence!

Regards .. Paul

Andrew Kyngdon

unread,

Oct 27, 2010, 6:22:07 PM10/27/10

to talking-m...@googlegroups.com

Hey Paul,

You stated that " But maybe that is what this list is really all about - for those who mostly work in education and who deal with educational profiency assessment?"

The list is open to anybody interested in critical appraisal of the attempt at psychological measurement. There are no restrictions to the field of educational assessment.

Your objections based on advances in neuroscience and cognitive science are interesting. I very nearly became a cognitive scientist with an interest in artificial neural nets and AI. But what caused concern for me at this time was that backpropagation algorithms had no biological correlate.

Anyway, if you have written a paper on the issues you addressed in your post, I invite you to discuss it here. It's been a long time since I have read any cognitive or neuroscience.

Cheers,

Andrew

Denny Borsboom

unread,

Oct 27, 2010, 6:31:25 PM10/27/10

to talking-m...@googlegroups.com

Paul,

you are quoting me out of context. Below is the context in which I
raised the chess example. I think that much of it is in line with your
comments. The difference is maybe that you aren't impressed with the
predictive and control properties of some models, elo being one of
them, while I don't understand why they work as well as they do. But
if you could explain, that might be really helpful, in contrast to the
psychometrics bashing, which doesn't make sense to me at all.

If you seriously consider the relation between theoretical attributes
and observations in psychology, and if you are involved in crafting
formal models to explicate that relation, then you're psychometrician
right?

Best
Denny

PS here is the context:

Paul Barrett

unread,

Oct 27, 2010, 8:14:43 PM10/27/10

to talking-m...@googlegroups.com

Denny, let me quote you again from the message below ..

" So what do we do? Do we conclude that chess ability is a continuous
quantity like length, measurable by tests and game performance? I would
hesitate, given the force of the objections under (1). Do we then conclude
that chessometrics is a pathological science, claiming quantitative
structure where there is none and deceiving the chess-playing community,
leading them by Stevens' primrose path in suggesting that Elo ratings are
quantitative? Again, I would hesitate, given the force of the evidence in
(2)."

You would hesitate to suggest that "chessometrics {as characterized by the
Elo} system is a pathological science, claiming quantitative structure where
there is none".

I'm saying "Of course it's not quantitative measurement" ... you only have
to read the list of ad-hoc adjustments being made to "Elo's" to make the
system work as "measurement" to realize it cannot possibly satisfy the
axioms of quantity.

However, the Elo rating system clearly works well, given the various
sensible "corrective" adjustments here and there. That's a substantive
achievement.

==========================================================

As to:

"If you seriously consider the relation between theoretical attributes
and observations in psychology, and if you are involved in crafting
formal models to explicate that relation, then you're psychometrician
right?"

Maybe so .. I'm not interested in quibbling over names, but about what the
name implies in terms of the investigative content of that activity. Right
now, all I see are largely proficiency and scholastic aptitude/performance
work from many psychometricians, to which the term edumetrics seems more
appropriate. Might be my poor eyesight causing this misperception?

Right at the heart of my comments is that other issue of whether you can
ever define "quantitative" measurement from "probability-based
aggregate-group effect models", or whether it is in essence deterministic -
it's properties must be shown to apply to every measure of every individual
object which is said to be "measurable" in terms of standard-unit, additive
metrics.

Which is going to be most unlikely given a self-organizing, non-stationary
system.

Which is why I use the term "good enough" assessment; an acceptance that
psychological "measurement" must always be fuzzy to some degree.

Perhaps I'm just taking an easy or intellectually lazy way out by not
bothering to formally model "error" anymore, rather just accepting it's
always there as a result of observing such a causally-complex generating
system in action for any sentient individual, and non-stationary over time.

We may be in agreement in some areas, but not I suspect with how we view
invoking formal probabilistic aggregate-data models to explain psychological
phenomena or construct quantitative measures of theoretical attributes.

==========================================================

As to my "psychometrics bashing" - the problem is Denny, I do see
psychometrics as a "pathology of science" for the reasons given by Michell
(and my own thoughts about sentient complex systems); you don't. So whatever
I say will end up as "psychometrics bashing" to you and those who would
argue psychometrics is not a pathology of science.

Regards .. Paul

-----Original Message-----
From: talking-m...@googlegroups.com
[mailto:talking-m...@googlegroups.com] On Behalf Of Denny Borsboom
Sent: Thursday, 28 October 2010 11:31 a.m.
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] Maraun on Measurement

Paul,

Best
Denny

Hi Jack,

Best
Denny

--

Denny Borsboom

unread,

Oct 28, 2010, 8:25:11 AM10/28/10

to talking-m...@googlegroups.com

Hi Paul,

1) we clearly have a different perception of chess and elo and other
testing situations. You don't share my sense of wonder. I am not sure
whether there's something you know but I don't. I don't see how you
would get the kind of quantitative control that people exercise in
adaptive testing, elo-matching, and lexile item production, if we were
looking at mere orders. If people can balance items and persons, or
match persons to persons as in chess, or control precisely the % of
items a kid will answer correctly, then there must be some
quantitative control element? Especially when there is no induction
involved, i.e. when we are predicting what will happen from design
principles, as when new kids make new items. Please explain to me how
this is possible with just order, and I'll get back in my cage.

2) Yes there is industrial production in the educational testing area.
This isn't particularly exciting in my view, but if you don't like it
you have to watch it? [While I would not agree that much of
psychometrics is pathological, I would certainly agree that much of it
is boring.] Also, as with anything involving education, there is
politics involved in this area, and as with anything involving
business, the answer to 99% of the questions is 'money'. Still, anyone
who condemns the enterprise tout court should consider the
alternatives. Recall how and why Binet made his tests. To give one
example: The evidence in my country suggests that teachers' judgments,
which are often considered the alternative to edumetric tests, are far
more sensitive to etnicity, cultural background, and sex, than the
edumetrically controlled tests.

Best
Denny

--

Denny Borsboom

unread,

Oct 28, 2010, 9:37:32 AM10/28/10

to talking-m...@googlegroups.com

Hi Jack,

thanks. I'm not sure the temperature example goes through. Even if
temperature is an emergent property and no temperature can be ascribed
to single particles, the relevant analogy in the case of chess ability
would seem to be that you can't ascribe chess ability to a part of the
person, but only to the whole person [analogous to people but not
neurons being smart]. This is true of course but not the issue I
suspect.

Instead, the relevant analogy to a person getting better at chess is a
room getting hotter say from 10 to 20 degrees. In this case the room
will go through all intermediate temperatures between 10 and 20 [i.e.
continuity], will do so linearly [i.e. unidimensionally], and in a
homogeneous way [i.e. by increasing the average kinetic energy of
particles]. Nicely in line with the quantitative nature of physical
magnitudes. Learning chess [or language] isn't like that. At least so
it seems to me; I may be wrong of course.

> Thus the causal action flows from the ability to the strategies and not the reverse. In this view it is not a failing that there is no prescribed progression in strategies either within or between subjects.

I'm not sure whether I understand this. Say that a child forgets a
chess rule. Then his behavioral reportoire is changed discretely in
that there's a discrete, countable set of moves he cannot do anymore?
I don't see how such a change is continuous or linear. I recall that
continuity is considered by the relevant people to be a prerequisite
for quantitative structure, at least that's how it is in the Hoelder
axioms that are cited all the time. [Alternatively the categorical
imposition of the continuity requirement could be a huge mistake].

Best
Denny

Paul Barrett

unread,

Oct 28, 2010, 6:23:44 PM10/28/10

to talking-m...@googlegroups.com

-----Original Message-----
From: talking-m...@googlegroups.com [mailto:talking-m...@googlegroups.com] On Behalf Of Denny Borsboom
Sent: Friday, 29 October 2010 1:25 a.m.
To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] Maraun on Measurement

Hi Paul,

> Hi Paul,
>
> 1) we clearly have a different perception of chess and elo and other
> testing situations. You don't share my sense of wonder. I am not sure
> whether there's something you know but I don't.
>

Denny, this is ridiculous. I showed you several comments from Wikipedia (on which there are more) which seem to indicates Elo ratings are augmented/truncated for several reasons according to age of player etc. results, maintaining "accuracy".

E.g. " This update can be performed after each game or each tournament, or after any suitable rating period. An example may help clarify. Suppose Player A has a rating of 1613, and plays in a five-round tournament. He loses to a player rated 1609, draws with a player rated 1477, defeats a player rated 1388, defeats a player rated 1586, and loses to a player rated 1720. His actual score is (0 + 0.5 + 1 + 1 + 0) = 2.5. His expected score, calculated according to the formula above, was (0.506 + 0.686 + 0.785 + 0.539 + 0.351) = 2.867. Therefore his new rating is (1613 + 32· (2.5 − 2.867)) = 1601, assuming that a K factor of 32 is used.

Note that while two wins, two losses, and one draw may seem like a par score, it is worse than expected for Player A because his opponents were lower rated on average. Therefore he is slightly penalized. If he had scored two wins, one loss, and two draws, for a total score of three points, that would have been slightly better than expected, and his new rating would have been (1613 + 32· (3 − 2.867)) = 1617.

This updating procedure is at the core of the ratings used by FIDE, USCF, Yahoo! Games, the ICC, and FICS. **** However, each organization has taken a different route to deal with the uncertainty inherent in the ratings, particularly the ratings of newcomers, and to deal with the problem of ratings inflation/deflation.*** New players are assigned provisional ratings, which are adjusted more drastically than established ratings."

Go to Wikipedia - read about the mathematical formulations and the "choice of K-factors", then how people "adjust" their Elo rating via strategy, and the adjustments being made to ratings (inflation/deflation) ...

None of this seems to give me any great confidence in the Elo as "quantitative measurement" as "Jim Kirk" would know it!. However, it's a pretty good/systematic way of ranking chess players - but is not as "precise" as you are claiming.

As to your" If people can balance items and persons, or match persons to persons as in chess, or control precisely the % of items a kid will answer correctly,", are you claiming that every child with a fixed amount of attribute X will perform at exactly 70% accuracy on a fixed set of items?

What is the unit of any X by the way, is it a quantile, the same unit as in Metametrics' tests, or can be shown to be a ratio of such?

I'm not sure why you provided this ... none of it addresses any point I made.

> 2) Yes there is industrial production in the educational testing area.
> This isn't particularly exciting in my view, but if you don't like it
> you have to watch it? [While I would not agree that much of
> psychometrics is pathological, I would certainly agree that much of it
> is boring.] Also, as with anything involving education, there is
> politics involved in this area, and as with anything involving
> business, the answer to 99% of the questions is 'money'. Still, anyone
> who condemns the enterprise tout court should consider the
> alternatives. Recall how and why Binet made his tests. To give one
> example: The evidence in my country suggests that teachers' judgments,
> which are often considered the alternative to edumetric tests, are far
> more sensitive to etnicity, cultural background, and sex, than the
> edumetrically controlled tests.
>

What I said was ... " I find it hard to understand what "modern psychometrics" is trying to achieve within a science of psychology, except if I replace "psychometrics" with the term "edumetrics". Then it all makes very good sense because edumetrics specialists are not trying to work as scientists, but as expert statisticians characterizing/modeling response patterns in a very specific field of educational and proficiency assessment."

Like mathsgarden and the host of ETS, ACT, ACER, PISA, and others - they all do a good practical job of dealing with educational attainment and proficiency scores.

When you can show me that all tested individuals who possess exactly the same amount of attribute X will achieve exactly the same measured outcome (within a degree of known instrument precision, over the entire range of X), using a standard unit metric for X and the outcome, I will acknowledge that the attribute is indeed quantitative for all practical purposes.

You don't need axiomatic measurement theory or any statistical model for this - it's deterministic, in the same way physics didn't need axiomatic theory or statistical models to develop its 'measures'.

What we are dealing with is really very simple, very empirical, and rather obvious. It either works, for all, or it doesn't.

But you don't need it to be "perfect" to do good things with it. And there are very good psychological reasons why "fuzzy order" - even 'sharp fuzzy order' - is about all you might reasonably expect to observe - once you stop treating "a departure from an expected quantitative values" as 'measurement' error.

Mathsgarden and the like clearly look like "near quantitative" - but that still means "fuzzy orders" whose boundaries are "sharp or diffuse" to some degree.

There is no "wonder" in that - it's pretty bleeding obvious when you look at information and designed processing load with a half-decent model of how a cognitive processing system might work, and sufficient experimental data to mine, permitting 'x-attribute magnitude to observed outcome probabilities' to be computed. Lehrl and Fisher were working like this years ago.

Oh well - there we go ..

Regards .. Paul

---------------------------
Advanced Projects R&D Ltd.
---------------------------
W: www.pbarrett.net
E: pa...@pbarrett.net
M: +64-(0)21-415625

Regards .. Paul

Stephen Humphry

unread,

Oct 28, 2010, 10:02:10 PM10/28/10

to talking-m...@googlegroups.com

Denny, there are many things that could be commented on with respect to the measurement of temperature here. I just want to pick up on one thing. You say:

Instead, the relevant analogy to a person getting better at chess is a
room getting hotter say from 10 to 20 degrees. In this case the room
will go through all intermediate temperatures between 10 and 20 [i.e.
continuity], will do so linearly [i.e. unidimensionally], and in a
homogeneous way [i.e. by increasing the average kinetic energy of
particles]. Nicely in line with the quantitative nature of physical
magnitudes. Learning chess [or language] isn't like that. At least so
it seems to me; I may be wrong of course.

S: I wouldn't say this is true in the case of any phase transition that involves latent heat.

Steve

Denny Borsboom

unread,

Oct 29, 2010, 7:46:01 AM10/29/10

to talking-m...@googlegroups.com

Paul,

> it's pretty bleeding obvious when you look at
> information and designed processing load with a half-decent model of how a
> cognitive processing system might work, and sufficient experimental data to
> mine, permitting 'x-attribute magnitude to observed outcome probabilities'
> to be computed. Lehrl and Fisher were working like this years ago.

Well it isn't pretty bleeding obvious to me. I don't know any
half-decent cognitive models of chess playing that would entail
anything about the psychometrics of individual differences. In fact I
don't think they exist.

It is, however, easy to imagine a world in which chess ability would
be such that you couldn't do any of the things you can do with elo
ratings. It is easy, for instance, to imagine that there were such
severe transitivity violations in chess performance that any kind of
ordering were next to impossible, let alone that we could control the
actual probabilities of winning. Why don't we see any of that? Of
course elo ratings are shabby statistical constructions. That's
precisely the point; you should not be able to do anything
quantitative with them.

Your "near quantitative" terminology begs the question with stunning
precision. What makes something "near quantitative" and how does it
become so? What is the difference between "near quantitative" and
"quantitative"? How do you get orders to behave "nearly
quantitatively"? Orders are orders are orders. They don't get to to
play in the quantitative game; certainly not when they are just
stochastic.

Best
Denny

On 10/29/10, Paul Barrett <pa...@pbarrett.net> wrote:
>
> -----Original Message-----
> From: talking-m...@googlegroups.com
> [mailto:talking-m...@googlegroups.com] On Behalf Of Denny Borsboom
> Sent: Friday, 29 October 2010 1:25 a.m.
> To: talking-m...@googlegroups.com
> Subject: Re: [talking-measurement] Maraun on Measurement
>
> Hi Paul,
>
>
>> Hi Paul,
>>
>> 1) we clearly have a different perception of chess and elo and other
>> testing situations. You don't share my sense of wonder. I am not sure
>> whether there's something you know but I don't.
>>
>
> Denny, this is ridiculous. I showed you several comments from Wikipedia (on
> which there are more) which seem to indicates Elo ratings are
> augmented/truncated for several reasons according to age of player etc.
> results, maintaining "accuracy".
>
> E.g. " This update can be performed after each game or each tournament, or
> after any suitable rating period. An example may help clarify. Suppose
> Player A has a rating of 1613, and plays in a five-round tournament. He
> loses to a player rated 1609, draws with a player rated 1477, defeats a
> player rated 1388, defeats a player rated 1586, and loses to a player rated
> 1720. His actual score is (0 + 0.5 + 1 + 1 + 0) = 2.5. His expected score,
> calculated according to the formula above, was (0.506 + 0.686 + 0.785 +

> 0.539 + 0.351) = 2.867. Therefore his new rating is (1613 + 32· (2.5 -

> 2.867)) = 1601, assuming that a K factor of 32 is used.
>
> Note that while two wins, two losses, and one draw may seem like a par
> score, it is worse than expected for Player A because his opponents were
> lower rated on average. Therefore he is slightly penalized. If he had scored
> two wins, one loss, and two draws, for a total score of three points, that
> would have been slightly better than expected, and his new rating would have

> been (1613 + 32· (3 - 2.867)) = 1617.

Denny Borsboom

unread,

Oct 29, 2010, 8:07:02 AM10/29/10

to talking-m...@googlegroups.com

S: I wouldn't say this is true in the case of any phase transition
that involves latent heat.

D: Why not? Phase transition models say that certain states of a
system are unstable, not that they don't exist. So our system may
have stable states at 10 and 20 degrees. That means that if it is
perturbed from 10 upwards, it will move through to 20 and not stay
anywhere in between. A phase transition model doesn't say that it
can't be 15 degrees, just that the system cannot stay 15 degrees,
given its dynamics.

Best
Denny

Stephen Humphry

unread,

Oct 29, 2010, 9:24:30 AM10/29/10

to talking-m...@googlegroups.com

Well, you'll need to define what you mean by "linearly [i.e. unidimensionally] ...". Best of luck with that. Steve

________________________________________
From: talking-m...@googlegroups.com [talking-m...@googlegroups.com] On Behalf Of Denny Borsboom [dennyb...@gmail.com]
Sent: Friday, 29 October 2010 8:07 PM

To: talking-m...@googlegroups.com
Subject: Re: [talking-measurement] Maraun on Measurement

S: I wouldn't say this is true in the case of any phase transition

Denny Borsboom

unread,

Oct 29, 2010, 9:59:29 AM10/29/10

to talking-m...@googlegroups.com

> Well, you'll need to define what you mean by "linearly [i.e.
> unidimensionally] ...". Best of luck with that. Steve

I mean that the topology of temperature changes is a line. Is this
problematic? I thought this is in accord with Hölder's axiom 1 and
also with the fact that temperature figures as a scalar in e.g. the
gas laws. However I think that you're the expert here, so please
correct me if I'm wrong.

D.

Stephen Humphry

unread,

Oct 31, 2010, 3:43:06 AM10/31/10

to talking-m...@googlegroups.com

> Well, you'll need to define what you mean by "linearly [i.e.
> unidimensionally] ...". Best of luck with that. Steve

D: I mean that the topology of temperature changes is a line. Is this problematic?

S: Are you serious? Genuine question.

Denny Borsboom

unread,

Oct 31, 2010, 10:11:54 AM10/31/10

to talking-m...@googlegroups.com

> S: Are you serious? Genuine question

D: yes. I have feeling that you're going to enlighten me so please proceed.

Denny Borsboom

unread,

Oct 31, 2010, 10:33:30 AM10/31/10

to talking-m...@googlegroups.com

possible miscommunication: of course I didn't mean that you can't
describe temperature changes in, say, a 4 dimensional space for 3d +
time; this is natural. What I mean is that for temperature you'd need
only one such representation.
best
denny

Stephen Humphry

unread,

Nov 1, 2010, 2:43:40 AM11/1/10

to talking-m...@googlegroups.com

> S: Are you serious? Genuine question

D: yes. I have feeling that you're going to enlighten me so please proceed.

S: That's OK thanks: if you think the 'topology of temperature change' means something, I'm reasonably certain we're not going to get anywhere ...

Denny Borsboom

unread,

Nov 1, 2010, 5:52:04 AM11/1/10

to talking-m...@googlegroups.com

> S: That's OK thanks: if you think the 'topology of temperature change' means
> something, I'm reasonably certain we're not going to get anywhere ...

D: why?

Reply all

Reply to author

Forward