11 views

Skip to first unread message

Jun 29, 2021, 7:47:40 AM6/29/21

to

Hi:

How do we conduct statistical tests to find the best screening method among a set of methods?

For example, we have 3 new methods of screening. We tested them in the same group of patients and verified the screening result of each method against a clinical standard method. Will we be sure to find the best method in the following way?

1>2 ^ 1>3 -> 1 the best

1>2 ^ 1<3 -> 3 the best

1<2 ^ 1>3 -> 2 the best

1<2 ^ 1<3 ^ 2>3 -> 2 the best

2<3 -> 3 the best

Are there other easier/faster ways to find the best method?

How do we conduct statistical tests to find the best screening method among a set of methods?

For example, we have 3 new methods of screening. We tested them in the same group of patients and verified the screening result of each method against a clinical standard method. Will we be sure to find the best method in the following way?

1>2 ^ 1>3 -> 1 the best

1>2 ^ 1<3 -> 3 the best

1<2 ^ 1>3 -> 2 the best

1<2 ^ 1<3 ^ 2>3 -> 2 the best

2<3 -> 3 the best

Are there other easier/faster ways to find the best method?

Jun 30, 2021, 5:39:00 PM6/30/21

to

Cosine 在 2021年6月29日 星期二下午7:47:40 [UTC+8] 的信中寫道：

A relevant question is how do we know the level of confidence?

For example, we have methods 1, 2, and 3.

1>2 with 95% confidence and 1>3 with 90% confidence

then 1 is the best by logic, but with what statistical confidence?

Even we have 1>2 w/ 95% and 1>3 w/ 95%, could we be sure that 1 is the best with 95%? Why?

For example, we have methods 1, 2, and 3.

1>2 with 95% confidence and 1>3 with 90% confidence

then 1 is the best by logic, but with what statistical confidence?

Even we have 1>2 w/ 95% and 1>3 w/ 95%, could we be sure that 1 is the best with 95%? Why?

Jul 1, 2021, 10:46:51 PM7/1/21

to

Cosine <ase...@gmail.com> wrote:

> Cosine ??? 2021???6???29??? ???????????????7:47:40 [UTC+8] ??????????????????

a screen, so need to test screen and follow-ups simultaneously versus

cost-benefit. Can give likelihood to each ordering, so can say 1-2-3 is 5x

more likely than 2-1-3.

> Cosine ??? 2021???6???29??? ???????????????7:47:40 [UTC+8] ??????????????????

>> How do we conduct statistical tests to find the best screening method

>> among a set of methods?

>> For example, we have 3 new methods of screening. We tested them in

>> among a set of methods?

>> For example, we have 3 new methods of screening. We tested them in

> Even we have 1>2 w/ 95% and 1>3 w/ 95%, could we be sure that 1 is the best with 95%? Why?

"Best" depends on the setting - Sens may be more important than Spec for
a screen, so need to test screen and follow-ups simultaneously versus

cost-benefit. Can give likelihood to each ordering, so can say 1-2-3 is 5x

more likely than 2-1-3.

Jul 2, 2021, 2:18:29 AM7/2/21

to

for "best combination of MULTIPLE considerations" for

statistical decision-making.

Three competitors instead of two.

"best screening" combines Sens and Spec, with cost-benefit

(as David notes), along with choice of population to sample.

And the "cost" can be concrete, in dollars per test, or it can be

as subjective as the "benefit" by starting out as the projected

number of cases missed or mistakenly mis-attributed.

The cost-benefit must incorporate the purpose of the

decision-making for the particular sample. - "Ideal" screening

varies between samples with low and high prevalence.

Ranking of results can raise the question of whether 1>2

and 2>3 always implies 1>3; but you might have skipped that

complication.

--

Rich Ulrich

Jul 2, 2021, 5:45:12 AM7/2/21

to

Rich Ulrich 在 2021年7月2日 星期五下午2:18:29 [UTC+8] 的信中寫道：

We actually tested by samples to get the result showing that 1>2 w/ 95% confidence.

The same for 2>3 w/ 95%.

But we did NOT do any test to get an actual result showing that 1>3 w/ some confidence.

Does it mean that we still require to test if 1>3 and to get the statistical confidence?

Or there are some ways to show that 1>3 w/ some confidence based on the results of

1>2 w/ 95% and 2>3 w/ 90%?

> On Fri, 2 Jul 2021 02:46:44 +0000 (UTC), David Duffy

> ...
> Ranking of results can raise the question of whether 1>2

> and 2>3 always implies 1>3; but you might have skipped that

> complication.

>

Well, that is also an issue.
> and 2>3 always implies 1>3; but you might have skipped that

> complication.

>

We actually tested by samples to get the result showing that 1>2 w/ 95% confidence.

The same for 2>3 w/ 95%.

But we did NOT do any test to get an actual result showing that 1>3 w/ some confidence.

Does it mean that we still require to test if 1>3 and to get the statistical confidence?

Or there are some ways to show that 1>3 w/ some confidence based on the results of

1>2 w/ 95% and 2>3 w/ 90%?

Jul 2, 2021, 10:29:09 PM7/2/21

to

On Fri, 2 Jul 2021 02:45:09 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>Rich Ulrich ? 2021?7?2? ?????2:18:29 [UTC+8] ??????

details, but there are /some/ complicated comparisons that are not

transitive, doing the brute comparisons by pairs.

One awkward scoring that I do recall something about is the

scoring for women's Olympic Ice Skating. Skaters are ranked

in each of several events. Those rank-scores are later combined

(in some fashion... weighting?) to get a final ranking to determine

a winner. I watched a competition where, at the time that the

final skater did her final event, it was possible that (IIRC) any of

the three skaters at the top could end up as #1, #2, or #3.

>

>We actually tested by samples to get the result showing that 1>2 w/ 95% confidence.

>

Stating such-and-so "with 95% confidence" is a syntax that will

grate with a large number of good statisticians. The parameter

(or difference) is not the proper object of "95%"; that describes

the CI. You can find some classical quotes on this in the Wiki

artiicle at https://en.wikipedia.org/wiki/Confidence_interval , under

"misunderstandings". By the way, the article (all in all) could

benefit from expert re-writing, as mentioned in the head-notes

by Wiki overseers.

>The same for 2>3 w/ 95%.

>

>But we did NOT do any test to get an actual result showing that 1>3 w/ some confidence.

>

>Does it mean that we still require to test if 1>3 and to get the statistical confidence?

>

>Or there are some ways to show that 1>3 w/ some confidence based on the results of

>1>2 w/ 95% and 2>3 w/ 90%?

I will mention another ranking complication. When you use SNK for

"post-hoc" range testing, the formal derivation requires that you

test from the outside, heading in. If the extremes do not differ,

you never test the middle value. Of course, the SNK tests" here use

different cutoff values when comparing low to next / low to high.

If you are "merely" using several two-group tests, then here is a

place where paradoxes might seem to arise: two-group tests, with

extreme differences in variance, and groups of vastly different size.

Oh, and when there are "paried" measurements, your correlations

may differ and that can have consequences.

If your question gets reduced to a question of How does /this/

test behave, comparing A to B, B to C, and inferring A vs C:

You probably can set limits showing for your question above

that A has to differ from C (for that test), even when using

"p-level" as an effect-size indicator. The demonstration may

be different for "pooled variance" tests and "separate variance"

tests.

--

Rich Ulrich

wrote:

>Rich Ulrich ? 2021?7?2? ?????2:18:29 [UTC+8] ??????

>> On Fri, 2 Jul 2021 02:46:44 +0000 (UTC), David Duffy

>> ...

>> Ranking of results can raise the question of whether 1>2

>> and 2>3 always implies 1>3; but you might have skipped that

>> complication.

>>

>

>Well, that is also an issue.

No, using CI's was not what I was thinking of. I don't remember
>> ...

>> Ranking of results can raise the question of whether 1>2

>> and 2>3 always implies 1>3; but you might have skipped that

>> complication.

>>

>

>Well, that is also an issue.

details, but there are /some/ complicated comparisons that are not

transitive, doing the brute comparisons by pairs.

One awkward scoring that I do recall something about is the

scoring for women's Olympic Ice Skating. Skaters are ranked

in each of several events. Those rank-scores are later combined

(in some fashion... weighting?) to get a final ranking to determine

a winner. I watched a competition where, at the time that the

final skater did her final event, it was possible that (IIRC) any of

the three skaters at the top could end up as #1, #2, or #3.

>

>We actually tested by samples to get the result showing that 1>2 w/ 95% confidence.

>

grate with a large number of good statisticians. The parameter

(or difference) is not the proper object of "95%"; that describes

the CI. You can find some classical quotes on this in the Wiki

artiicle at https://en.wikipedia.org/wiki/Confidence_interval , under

"misunderstandings". By the way, the article (all in all) could

benefit from expert re-writing, as mentioned in the head-notes

by Wiki overseers.

>The same for 2>3 w/ 95%.

>

>But we did NOT do any test to get an actual result showing that 1>3 w/ some confidence.

>

>Does it mean that we still require to test if 1>3 and to get the statistical confidence?

>

>Or there are some ways to show that 1>3 w/ some confidence based on the results of

>1>2 w/ 95% and 2>3 w/ 90%?

"post-hoc" range testing, the formal derivation requires that you

test from the outside, heading in. If the extremes do not differ,

you never test the middle value. Of course, the SNK tests" here use

different cutoff values when comparing low to next / low to high.

If you are "merely" using several two-group tests, then here is a

place where paradoxes might seem to arise: two-group tests, with

extreme differences in variance, and groups of vastly different size.

Oh, and when there are "paried" measurements, your correlations

may differ and that can have consequences.

If your question gets reduced to a question of How does /this/

test behave, comparing A to B, B to C, and inferring A vs C:

You probably can set limits showing for your question above

that A has to differ from C (for that test), even when using

"p-level" as an effect-size indicator. The demonstration may

be different for "pooled variance" tests and "separate variance"

tests.

--

Rich Ulrich

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu