On Sat, 18 Mar 2023 01:25:44 -0000 (UTC), "David Jones"
<
dajh...@nowherel.com> wrote:
>Cosine wrote:
>
>> Hi:
>>
>> We could easily find in the literature that a study used more than
>> one performance metric for the hypothesis test without explicitly and
>> clearly stating what hypothesis this study aims to test.
That sounds like a journal with reviewers who are not doing their job.
A new method may have better sensitivity or specificity, making it
useful as a second test. If it is cheaper/easier, that virtue might
justify slight inferiority. If it is more expensive, there should be
a gain in accuracy to justify its application (or, it deserves further
development).
> Often the
>> paper only states that it intends to test if a newly developed object
>> (algorithm, drug, device, technique, etc) would perform better than
>> some chosen benchmarks. Then the paper presents some tables
>> summarizing the results of many comparisons. Among the tables, the
>> paper picks those comparisons having better values of some
>> performance metric and showing statistical significance. Finally, the
>> paper claims that the new object is successful since it has some
>> favorable results that are statistically significant.
>>
>> This looks odd. SHouldn't we clearly define the hypothesis before
>> conducting any tests? For example, shouldn't we define the success of
>> the object to be "having all the chosen metrics have better results"?
>> Otherwise, why would we test so many metrics, instead of only one?
>>
>> The aforementioned approach looks like this: we do not know what
>> would happen. So let's pick some commonly used metrics to test if we
>> could get some of them to show favorable and significant results.
I am not comfortable with your use of the word 'metrics' -- I like
to think of improving the metrics of a scale by taking a power
transformation, like, square root for Poisson, etc.
Or, your metric for measuring 'size' might be area, volume, weight....
>>
>> Anyway, what are the correct or rigorous ways to conduct tests
>> with multiple metrics?
>
>You might want to search for the terms "multiple testing" and
>"Bonferroni correction".
That answers the final question -- assuming that you do have
some stated hypothesis or goal.
--
Rich Ulrich