On Fri, 2 Jul 2021 02:45:09 -0700 (PDT), Cosine <
ase...@gmail.com>
wrote:
>Rich Ulrich ? 2021?7?2? ?????2:18:29 [UTC+8] ??????
>> On Fri, 2 Jul 2021 02:46:44 +0000 (UTC), David Duffy
>> ...
>> Ranking of results can raise the question of whether 1>2
>> and 2>3 always implies 1>3; but you might have skipped that
>> complication.
>>
>
>Well, that is also an issue.
No, using CI's was not what I was thinking of. I don't remember
details, but there are /some/ complicated comparisons that are not
transitive, doing the brute comparisons by pairs.
One awkward scoring that I do recall something about is the
scoring for women's Olympic Ice Skating. Skaters are ranked
in each of several events. Those rank-scores are later combined
(in some fashion... weighting?) to get a final ranking to determine
a winner. I watched a competition where, at the time that the
final skater did her final event, it was possible that (IIRC) any of
the three skaters at the top could end up as #1, #2, or #3.
>
>We actually tested by samples to get the result showing that 1>2 w/ 95% confidence.
>
Stating such-and-so "with 95% confidence" is a syntax that will
grate with a large number of good statisticians. The parameter
(or difference) is not the proper object of "95%"; that describes
the CI. You can find some classical quotes on this in the Wiki
artiicle at
https://en.wikipedia.org/wiki/Confidence_interval , under
"misunderstandings". By the way, the article (all in all) could
benefit from expert re-writing, as mentioned in the head-notes
by Wiki overseers.
>The same for 2>3 w/ 95%.
>
>But we did NOT do any test to get an actual result showing that 1>3 w/ some confidence.
>
>Does it mean that we still require to test if 1>3 and to get the statistical confidence?
>
>Or there are some ways to show that 1>3 w/ some confidence based on the results of
>1>2 w/ 95% and 2>3 w/ 90%?
I will mention another ranking complication. When you use SNK for
"post-hoc" range testing, the formal derivation requires that you
test from the outside, heading in. If the extremes do not differ,
you never test the middle value. Of course, the SNK tests" here use
different cutoff values when comparing low to next / low to high.
If you are "merely" using several two-group tests, then here is a
place where paradoxes might seem to arise: two-group tests, with
extreme differences in variance, and groups of vastly different size.
Oh, and when there are "paried" measurements, your correlations
may differ and that can have consequences.
If your question gets reduced to a question of How does /this/
test behave, comparing A to B, B to C, and inferring A vs C:
You probably can set limits showing for your question above
that A has to differ from C (for that test), even when using
"p-level" as an effect-size indicator. The demonstration may
be different for "pooled variance" tests and "separate variance"
tests.
--
Rich Ulrich