Google Groups no longer supports new Usenet posts or subscriptions. Historical content remains viewable.

Dismiss

Q interpretations for different types of comparisons

7 views

Skip to first unread message

Cosine

unread,

Feb 11, 2023, 11:07:28 AM2/11/23

Hi:

We have a new method, A, and some benchmarks: B1, B2, and B3.

We compare the performances of the above methods. Each comparison uses a two-sided test.

Are the first two types of comparisons identical?

Is the interpretation of type-3 correct?

Type-1:
all significant: A > B1, A > B2, and A > B3 => claim: A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

Type-2:

All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
Significant: A > B1 => A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

Type-3:

All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
Non-significant: A > B1 => accepting the H0, i.e., performance of A and B1
has no difference => A is better than B2 and B3.

Rich Ulrich

unread,

Feb 11, 2023, 2:43:23 PM2/11/23

On Sat, 11 Feb 2023 08:07:26 -0800 (PST), Cosine <ase...@gmail.com>
wrote:

>Hi:
>
> We have a new method, A, and some benchmarks: B1, B2, and B3.
>
> We compare the performances of the above methods. Each comparison uses a two-sided test.
>
> Are the first two types of comparisons identical?
>
> Is the interpretation of type-3 correct?
>
> Type-1:
> all significant: A > B1, A > B2, and A > B3 => claim: A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

Clearly -

>
> Type-2:
>
> All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
> Significant: A > B1 => A is superior to all benchmarks, i.e., A is the best among all these 4 methods.

Not entirely CLEARLY. Have you ever drawn lines that underline
the 'not-different' groups, for post-hoc testing? The basic theory,
which gives no inconsistencies in real data, assumes that Ns and
variances are equal. Real data can yield 'weird' results if you look
at the separate two-group tests; so, the recommended algorithms
perform two-group tests that use the all-group variance, and fake
the group Ns to be the same.

So, this is "True" by inference which assumes 'nothing weird is
happening.'

I've done testing against a benchmark which entailed paired-tests;
for paired data, 'nothing weird' also assumes that the r's are not
different.

>
> Type-3:
>
> All significant: B1 > B2 and B1 > B3 => B1 is the best among all B's.
> Non-significant: A > B1 => accepting the H0, i.e., performance of A and B1
> has no difference => A is better than B2 and B3.

No. Doesn't even depend on varainces and Ns.

It is easy to imagine that B1 is slightly better than A, though
not significiant; and the difference is enough so that A is not
'better' (significantly) than B2 and B3. This is a common picture
in post-hoc drawings: (B1,A) underlined together as not-different,
and (A, B2, B3) underlined together as not-different

--
Rich Ulrich

0 new messages