11 views

Skip to first unread message

Oct 9, 2021, 8:08:55 PM10/9/21

to

Hi:

Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using the t-test statistic, given only those sample information but not the raw data?

Thank you,

Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using the t-test statistic, given only those sample information but not the raw data?

Thank you,

Oct 9, 2021, 8:50:23 PM10/9/21

to

Cosine <ase...@gmail.com> wrote:

> Suppose we did a study. In this study, we tested the effects of drugs A and B

> Suppose we did a study. In this study, we tested the effects of drugs A and B

> Now, suppose we found another study that tested the effects of drugs C and D

See "network meta-analysis".
Oct 10, 2021, 8:23:13 AM10/10/21

to

What if the purpose is to compare the drug A published in paper 1, drug B in paper 2, and so on?

Could we again use the t-test for comparing the data from different papers?

Could we again use the t-test for comparing the data from different papers?

Oct 10, 2021, 2:14:27 PM10/10/21

to

On Sat, 9 Oct 2021 17:08:53 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>Hi:

>

> Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

>

> Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using the t-test statistic, given only those sample information but not the raw data?

>

Most studies only test ONE drug against placebo. They

care about one drug, and they want all their "power" to

go to that comparison.

For the purpose of your question, comparing A to C

(or to D), you would be looking at the performance

of each drug in comparison to pbo.

Describing the studies as having "two drugs" is a red

herring, or it is a non-informative complication.

Here is a modern form of your question, of current interest --

If one Covid vaccine shows 95% protection in its main study

and another vaccine shows 90% protection in its study, can

we conclude that the first is better than the second? What

about, compared to 80%?

Well, as a mechanical proposition, we certainly can take the

estimates and their SEs and generate a test. But we KNOW

that the samples differed (location; age/sex/ethnicity?). If they

were in a different time frame (or, even if not), maybe they

were tested against a different dominate mutation of the virus.

The instructions for case-ascertainment may have differed.

And so on.

95% vs 90% is based on small enough numbers that, if p < 0.05,

it probably is not p< 0.001 (or better). So that "tested" difference

is unpersuasive. We /know/ that uncontrolled factors /exist/

and thus could be responsible. For establishing one is better,

a test is necessary but not sufficient. We would have heard more

if one of the vaccines had come in at only (say) 75%, which

a-priori, before the studies, based on flu vaccines, did not seem

like a terrible effiicacy.

We want to see an "effect size" large enough that it is unlikely

to have happened by chance. If those "confounding factors"

see small, or if they exist such that they would bias /against/

the better performing drug, then a test on their difference

showing a bigger difference can be a bit persuasive. There's

all those (educated) readers whom you have to convince.

For Covid, they seem to use all three obvious criteria --

getting symptoms, getting hospitalized, dying. A vaccine

does look better if it looks better on all three criteria.

Performance in whole populations (states, countries) also

washes out the idiosyncracies of the original studies.

--

Rich Ulrich

wrote:

>Hi:

>

> Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

>

> Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using the t-test statistic, given only those sample information but not the raw data?

>

care about one drug, and they want all their "power" to

go to that comparison.

For the purpose of your question, comparing A to C

(or to D), you would be looking at the performance

of each drug in comparison to pbo.

Describing the studies as having "two drugs" is a red

herring, or it is a non-informative complication.

Here is a modern form of your question, of current interest --

If one Covid vaccine shows 95% protection in its main study

and another vaccine shows 90% protection in its study, can

we conclude that the first is better than the second? What

about, compared to 80%?

Well, as a mechanical proposition, we certainly can take the

estimates and their SEs and generate a test. But we KNOW

that the samples differed (location; age/sex/ethnicity?). If they

were in a different time frame (or, even if not), maybe they

were tested against a different dominate mutation of the virus.

The instructions for case-ascertainment may have differed.

And so on.

95% vs 90% is based on small enough numbers that, if p < 0.05,

it probably is not p< 0.001 (or better). So that "tested" difference

is unpersuasive. We /know/ that uncontrolled factors /exist/

and thus could be responsible. For establishing one is better,

a test is necessary but not sufficient. We would have heard more

if one of the vaccines had come in at only (say) 75%, which

a-priori, before the studies, based on flu vaccines, did not seem

like a terrible effiicacy.

We want to see an "effect size" large enough that it is unlikely

to have happened by chance. If those "confounding factors"

see small, or if they exist such that they would bias /against/

the better performing drug, then a test on their difference

showing a bigger difference can be a bit persuasive. There's

all those (educated) readers whom you have to convince.

For Covid, they seem to use all three obvious criteria --

getting symptoms, getting hospitalized, dying. A vaccine

does look better if it looks better on all three criteria.

Performance in whole populations (states, countries) also

washes out the idiosyncracies of the original studies.

--

Rich Ulrich

Oct 10, 2021, 3:37:53 PM10/10/21

to

Let's try the case for developing a new AI algorithm to help screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm could use the medical images as input or use all other relevant information.

Now we would face the question of comparing the performances of different algorithms. As a standard practice, we would need to compare the newly developed algorithm against the state-of-the-art algorithms. We could implement those published algorithms and then compare them with the new one using the same dataset we have. A more convenient alternative is to compare the performances of the new one we produced with those of the published paper using other datasets. Could we perform the second approach using the t-test or what else should we use?

Oct 11, 2021, 8:06:14 AM10/11/21

to

On Sun, 10 Oct 2021 12:37:51 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>

affects a whole slew of systems. I wonder how many of them are

easy to examine and compare.

>

> Now we would face the question of comparing the performances of

> different algorithms. As a standard practice, we would need to compare

> the newly developed algorithm against the state-of-the-art algorithms.

> We could implement those published algorithms and then compare them

> with the new one using the same dataset we have.

Yes - I think that any "algorithm" approach will always apply all

algorithms to the same data. There is ENORMOUSLY more power

in doing the "paired" comparisons than comparing to something

derived on some other sets of data, no matter how well defined

their sampling is. Presumably, you look for sensitivity and

specificity, and have to make some judgment on the cases where

two algorithms disagree (which is not possible, for two samplings).

"Gold standards" of dx may figure in, somewhere.

> A more convenient

> alternative is to compare the performances of the new one we produced

> with those of the published paper using other datasets. Could we

> perform the second approach using the t-test or what else should we

> use?

What do you imagine comparing, for two different samples and

two different algorithms?

If they come up with different rates of disease, you won't know

why.

--

Rich Ulrich

wrote:

>

> Let's try the case for developing a new AI algorithm to help

> screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm

> could use the medical images as input or use all other relevant

> information.

The picture of the lung is relatively specific. But Covid reportedly
> screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm

> could use the medical images as input or use all other relevant

> information.

affects a whole slew of systems. I wonder how many of them are

easy to examine and compare.

>

> Now we would face the question of comparing the performances of

> different algorithms. As a standard practice, we would need to compare

> the newly developed algorithm against the state-of-the-art algorithms.

> We could implement those published algorithms and then compare them

> with the new one using the same dataset we have.

algorithms to the same data. There is ENORMOUSLY more power

in doing the "paired" comparisons than comparing to something

derived on some other sets of data, no matter how well defined

their sampling is. Presumably, you look for sensitivity and

specificity, and have to make some judgment on the cases where

two algorithms disagree (which is not possible, for two samplings).

"Gold standards" of dx may figure in, somewhere.

> A more convenient

> alternative is to compare the performances of the new one we produced

> with those of the published paper using other datasets. Could we

> perform the second approach using the t-test or what else should we

> use?

two different algorithms?

If they come up with different rates of disease, you won't know

why.

--

Rich Ulrich

Oct 11, 2021, 11:00:36 AM10/11/21

to

Let's clarify some points for the AI algorithms based on the dataset of patient images.

A general pattern of this kind of researches is: a new algorithm was proposed and its performance was investigated, e.g., sensitivity or specificity. This was realized by comparing the AI results against the gold standard, e.g., the PCR test or something else. In addition to that, the paper will also present the results of other published AI algorithms to show that the proposed one is better.

If the paper implemented the published algorithms, then the standard t-test for the difference of the random variables is performed. However, sometimes, the paper chose to compare its own results with the results published in other papers. Apparently, one cannot directly compare the sensitivity/specificity of the proposed algorithm with those of other published papers. How do we formally do this comparison then?

A sad truth is that, for CoVid-19, the publicly available and large datasets of patient images are still scarce. Maybe this is why some papers chose to compare their own results of the proposed algorithm based on a small to medium dataset with the results of the published paper based on a large dataset.

A general pattern of this kind of researches is: a new algorithm was proposed and its performance was investigated, e.g., sensitivity or specificity. This was realized by comparing the AI results against the gold standard, e.g., the PCR test or something else. In addition to that, the paper will also present the results of other published AI algorithms to show that the proposed one is better.

If the paper implemented the published algorithms, then the standard t-test for the difference of the random variables is performed. However, sometimes, the paper chose to compare its own results with the results published in other papers. Apparently, one cannot directly compare the sensitivity/specificity of the proposed algorithm with those of other published papers. How do we formally do this comparison then?

A sad truth is that, for CoVid-19, the publicly available and large datasets of patient images are still scarce. Maybe this is why some papers chose to compare their own results of the proposed algorithm based on a small to medium dataset with the results of the published paper based on a large dataset.

Oct 12, 2021, 12:49:13 PM10/12/21

to

On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>Let's clarify some points for the AI algorithms based on the dataset of patient images.

>

>A general pattern of this kind of researches is: a new algorithm was

> proposed and its performance was investigated, e.g., sensitivity or

> specificity. This was realized by comparing the AI results against the

> gold standard, e.g., the PCR test or something else. In addition to

> that, the paper will also present the results of other published AI

> algorithms to show that the proposed one is better.

Sensitivity/specificity go hand in hand. There is a whole curve to

compare. The test that is best at one extreme may not be best

at the other. One Covid-antigen survey in California, mid-2020,

used two different cut-offs for "yes, this person has been infected"

- depending on the base-rate of illness in that region. The final

estimates of disease prevalence made efforts (applied formulas)

to account for false-positives and false-negatives in the raw data.

>

> If the paper implemented the published algorithms, then the

> standard t-test for the difference of the random variables is

> performed.

- paired tests - Good power, and no question about "sample"

differences.

> However, sometimes, the paper chose to compare its own

> results with the results published in other papers. Apparently, one

> cannot directly compare the sensitivity/specificity of the proposed

> algorithm with those of other published papers. How do we formally

> do this comparison then?

You write, "One cannot directly [do A]... How do we formally [do A]?"

As I wrote last time: You can do the test. Then you have to argue

that your "significant" effect is large enough that it would be robust

against the likely or possible /confounding/ differences between

samples.

Your best chance of that is when the potential replacement is

tested in conditions that provide /lower/ expectations of good

outcome.

>

> A sad truth is that, for CoVid-19, the publicly available and

> large datasets of patient images are still scarce. Maybe this is why

> some papers chose to compare their own results of the proposed

> algorithm based on a small to medium dataset with the results of the

> published paper based on a large dataset.

Exploratory work. "We think we have a good competitor" because

it is cheaper and uses better science.

--

Rich Ulrich

wrote:

>Let's clarify some points for the AI algorithms based on the dataset of patient images.

>

>A general pattern of this kind of researches is: a new algorithm was

> proposed and its performance was investigated, e.g., sensitivity or

> specificity. This was realized by comparing the AI results against the

> gold standard, e.g., the PCR test or something else. In addition to

> that, the paper will also present the results of other published AI

> algorithms to show that the proposed one is better.

compare. The test that is best at one extreme may not be best

at the other. One Covid-antigen survey in California, mid-2020,

used two different cut-offs for "yes, this person has been infected"

- depending on the base-rate of illness in that region. The final

estimates of disease prevalence made efforts (applied formulas)

to account for false-positives and false-negatives in the raw data.

>

> If the paper implemented the published algorithms, then the

> standard t-test for the difference of the random variables is

> performed.

differences.

> However, sometimes, the paper chose to compare its own

> results with the results published in other papers. Apparently, one

> cannot directly compare the sensitivity/specificity of the proposed

> algorithm with those of other published papers. How do we formally

> do this comparison then?

As I wrote last time: You can do the test. Then you have to argue

that your "significant" effect is large enough that it would be robust

against the likely or possible /confounding/ differences between

samples.

Your best chance of that is when the potential replacement is

tested in conditions that provide /lower/ expectations of good

outcome.

>

> A sad truth is that, for CoVid-19, the publicly available and

> large datasets of patient images are still scarce. Maybe this is why

> some papers chose to compare their own results of the proposed

> algorithm based on a small to medium dataset with the results of the

> published paper based on a large dataset.

it is cheaper and uses better science.

--

Rich Ulrich

Oct 12, 2021, 2:51:17 PM10/12/21

to

Rich Ulrich 在 2021年10月13日 星期三上午12:49:13 [UTC+8] 的信中寫道：

> ....

and then claim that algorithm-1 performs better. However, if the other paper provided mu2, SE2,

and n2 (sample number,) we should be able to use this information to calculate the statistical

significance of the random variable (mu1-mu2) by using the t-test, since the formula of the t-test

used only those three variables of the two samples: mu, SE, and n to form a new random variable.

> On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine

> wrote:
> ....

> > However, sometimes, the paper chose to compare its own

> > results with the results published in other papers. Apparently, one

> > cannot directly compare the sensitivity/specificity of the proposed

> > algorithm with those of other published papers. How do we formally

> > do this comparison then?

> You write, "One cannot directly [do A]... How do we formally [do A]?"

>

> As I wrote last time: You can do the test. Then you have to argue

> that your "significant" effect is large enough that it would be robust

> against the likely or possible /confounding/ differences between

> samples.

>

By "we cannot directly compare ..." I meant that we cannot compare directly mu1 > mu2
> > results with the results published in other papers. Apparently, one

> > cannot directly compare the sensitivity/specificity of the proposed

> > algorithm with those of other published papers. How do we formally

> > do this comparison then?

> You write, "One cannot directly [do A]... How do we formally [do A]?"

>

> As I wrote last time: You can do the test. Then you have to argue

> that your "significant" effect is large enough that it would be robust

> against the likely or possible /confounding/ differences between

> samples.

>

and then claim that algorithm-1 performs better. However, if the other paper provided mu2, SE2,

and n2 (sample number,) we should be able to use this information to calculate the statistical

significance of the random variable (mu1-mu2) by using the t-test, since the formula of the t-test

used only those three variables of the two samples: mu, SE, and n to form a new random variable.

Oct 14, 2021, 2:00:06 PM10/14/21

to

On Tue, 12 Oct 2021 11:51:15 -0700 (PDT), Cosine <ase...@gmail.com>

wrote:

>Rich Ulrich ? 2021?10?13? ?????12:49:13 [UTC+8] ??????

Okay, "directly" meant "with no test".

Do keep in mind my warning,

Rich Ulrich

wrote:

>Rich Ulrich ? 2021?10?13? ?????12:49:13 [UTC+8] ??????

Do keep in mind my warning,

> Then you have to argue

> that your "significant" effect is large enough that it would be robust

> against the likely or possible /confounding/ differences between

> samples.

>

--
> that your "significant" effect is large enough that it would be robust

> against the likely or possible /confounding/ differences between

> samples.

>

Rich Ulrich

Reply all

Reply to author

Forward

0 new messages

Search

Clear search

Close search

Google apps

Main menu