Q comparing the two groups in the same or different publications

Cosine

unread,

Oct 9, 2021, 8:08:55 PM10/9/21

to

Hi:

Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.

Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using the t-test statistic, given only those sample information but not the raw data?

Thank you,

David Duffy

unread,

Oct 9, 2021, 8:50:23 PM10/9/21

to

Cosine <ase...@gmail.com> wrote:
> Suppose we did a study. In this study, we tested the effects of drugs A and B

> Now, suppose we found another study that tested the effects of drugs C and D

See "network meta-analysis".

Cosine

unread,

Oct 10, 2021, 8:23:13 AM10/10/21

to

What if the purpose is to compare the drug A published in paper 1, drug B in paper 2, and so on?

Could we again use the t-test for comparing the data from different papers?

Rich Ulrich

unread,

Oct 10, 2021, 2:14:27 PM10/10/21

to

On Sat, 9 Oct 2021 17:08:53 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:

>Hi:
>
> Suppose we did a study. In this study, we tested the effects of drugs A and B and of the placebo to treat the disease Z. We could use the t-test statistic of the random variables A and B to see if the difference between the two drugs is statistically significant. The formula requires the sample means, standard errors, and the numbers of samples of the two samples of the drug A and B.
>
> Now, suppose we found another study that tested the effects of drugs C and D and of the placebo to treat the disease Z. Could we determine if there are differences in treating the disease Z between drug A and C and between drug A and D again by using the t-test statistic, given only those sample information but not the raw data?
>

Most studies only test ONE drug against placebo. They
care about one drug, and they want all their "power" to
go to that comparison.

For the purpose of your question, comparing A to C
(or to D), you would be looking at the performance
of each drug in comparison to pbo.

Describing the studies as having "two drugs" is a red
herring, or it is a non-informative complication.

Here is a modern form of your question, of current interest --

If one Covid vaccine shows 95% protection in its main study
and another vaccine shows 90% protection in its study, can
we conclude that the first is better than the second? What
about, compared to 80%?

Well, as a mechanical proposition, we certainly can take the
estimates and their SEs and generate a test. But we KNOW
that the samples differed (location; age/sex/ethnicity?). If they
were in a different time frame (or, even if not), maybe they
were tested against a different dominate mutation of the virus.
The instructions for case-ascertainment may have differed.
And so on.

95% vs 90% is based on small enough numbers that, if p < 0.05,
it probably is not p< 0.001 (or better). So that "tested" difference
is unpersuasive. We /know/ that uncontrolled factors /exist/
and thus could be responsible. For establishing one is better,
a test is necessary but not sufficient. We would have heard more
if one of the vaccines had come in at only (say) 75%, which
a-priori, before the studies, based on flu vaccines, did not seem
like a terrible effiicacy.

We want to see an "effect size" large enough that it is unlikely
to have happened by chance. If those "confounding factors"
see small, or if they exist such that they would bias /against/
the better performing drug, then a test on their difference
showing a bigger difference can be a bit persuasive. There's
all those (educated) readers whom you have to convince.

For Covid, they seem to use all three obvious criteria --
getting symptoms, getting hospitalized, dying. A vaccine
does look better if it looks better on all three criteria.
Performance in whole populations (states, countries) also
washes out the idiosyncracies of the original studies.

--
Rich Ulrich

Cosine

unread,

Oct 10, 2021, 3:37:53 PM10/10/21

to

Let's try the case for developing a new AI algorithm to help screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm could use the medical images as input or use all other relevant information.

Now we would face the question of comparing the performances of different algorithms. As a standard practice, we would need to compare the newly developed algorithm against the state-of-the-art algorithms. We could implement those published algorithms and then compare them with the new one using the same dataset we have. A more convenient alternative is to compare the performances of the new one we produced with those of the published paper using other datasets. Could we perform the second approach using the t-test or what else should we use?

Rich Ulrich

unread,

Oct 11, 2021, 8:06:14 AM10/11/21

to

On Sun, 10 Oct 2021 12:37:51 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:

>

> Let's try the case for developing a new AI algorithm to help
> screen/detect/diagnose the disease, e.g., CoVid-19. The algorithm
> could use the medical images as input or use all other relevant
> information.

The picture of the lung is relatively specific. But Covid reportedly
affects a whole slew of systems. I wonder how many of them are
easy to examine and compare.

>
> Now we would face the question of comparing the performances of
> different algorithms. As a standard practice, we would need to compare
> the newly developed algorithm against the state-of-the-art algorithms.
> We could implement those published algorithms and then compare them
> with the new one using the same dataset we have.

Yes - I think that any "algorithm" approach will always apply all
algorithms to the same data. There is ENORMOUSLY more power
in doing the "paired" comparisons than comparing to something
derived on some other sets of data, no matter how well defined
their sampling is. Presumably, you look for sensitivity and
specificity, and have to make some judgment on the cases where
two algorithms disagree (which is not possible, for two samplings).

"Gold standards" of dx may figure in, somewhere.

> A more convenient
> alternative is to compare the performances of the new one we produced
> with those of the published paper using other datasets. Could we
> perform the second approach using the t-test or what else should we
> use?

What do you imagine comparing, for two different samples and
two different algorithms?
If they come up with different rates of disease, you won't know
why.

--
Rich Ulrich

Cosine

unread,

Oct 11, 2021, 11:00:36 AM10/11/21

to

Let's clarify some points for the AI algorithms based on the dataset of patient images.

A general pattern of this kind of researches is: a new algorithm was proposed and its performance was investigated, e.g., sensitivity or specificity. This was realized by comparing the AI results against the gold standard, e.g., the PCR test or something else. In addition to that, the paper will also present the results of other published AI algorithms to show that the proposed one is better.

If the paper implemented the published algorithms, then the standard t-test for the difference of the random variables is performed. However, sometimes, the paper chose to compare its own results with the results published in other papers. Apparently, one cannot directly compare the sensitivity/specificity of the proposed algorithm with those of other published papers. How do we formally do this comparison then?

A sad truth is that, for CoVid-19, the publicly available and large datasets of patient images are still scarce. Maybe this is why some papers chose to compare their own results of the proposed algorithm based on a small to medium dataset with the results of the published paper based on a large dataset.

Rich Ulrich

unread,

Oct 12, 2021, 12:49:13 PM10/12/21

to

On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:

>Let's clarify some points for the AI algorithms based on the dataset of patient images.
>
>A general pattern of this kind of researches is: a new algorithm was
> proposed and its performance was investigated, e.g., sensitivity or
> specificity. This was realized by comparing the AI results against the
> gold standard, e.g., the PCR test or something else. In addition to
> that, the paper will also present the results of other published AI
> algorithms to show that the proposed one is better.

Sensitivity/specificity go hand in hand. There is a whole curve to
compare. The test that is best at one extreme may not be best
at the other. One Covid-antigen survey in California, mid-2020,
used two different cut-offs for "yes, this person has been infected"
- depending on the base-rate of illness in that region. The final
estimates of disease prevalence made efforts (applied formulas)
to account for false-positives and false-negatives in the raw data.

>
> If the paper implemented the published algorithms, then the
> standard t-test for the difference of the random variables is
> performed.

- paired tests - Good power, and no question about "sample"
differences.

> However, sometimes, the paper chose to compare its own
> results with the results published in other papers. Apparently, one
> cannot directly compare the sensitivity/specificity of the proposed
> algorithm with those of other published papers. How do we formally
> do this comparison then?

You write, "One cannot directly [do A]... How do we formally [do A]?"

As I wrote last time: You can do the test. Then you have to argue
that your "significant" effect is large enough that it would be robust
against the likely or possible /confounding/ differences between
samples.

Your best chance of that is when the potential replacement is
tested in conditions that provide /lower/ expectations of good
outcome.

>
> A sad truth is that, for CoVid-19, the publicly available and
> large datasets of patient images are still scarce. Maybe this is why
> some papers chose to compare their own results of the proposed
> algorithm based on a small to medium dataset with the results of the
> published paper based on a large dataset.

Exploratory work. "We think we have a good competitor" because
it is cheaper and uses better science.

--
Rich Ulrich

Cosine

unread,

Oct 12, 2021, 2:51:17 PM10/12/21

to

Rich Ulrich 在 2021年10月13日星期三上午12:49:13 [UTC+8] 的信中寫道：

> On Mon, 11 Oct 2021 08:00:33 -0700 (PDT), Cosine

> wrote:
> ....

> > However, sometimes, the paper chose to compare its own
> > results with the results published in other papers. Apparently, one
> > cannot directly compare the sensitivity/specificity of the proposed
> > algorithm with those of other published papers. How do we formally
> > do this comparison then?
> You write, "One cannot directly [do A]... How do we formally [do A]?"
>
> As I wrote last time: You can do the test. Then you have to argue
> that your "significant" effect is large enough that it would be robust
> against the likely or possible /confounding/ differences between
> samples.
>

By "we cannot directly compare ..." I meant that we cannot compare directly mu1 > mu2
and then claim that algorithm-1 performs better. However, if the other paper provided mu2, SE2,
and n2 (sample number,) we should be able to use this information to calculate the statistical
significance of the random variable (mu1-mu2) by using the t-test, since the formula of the t-test
used only those three variables of the two samples: mu, SE, and n to form a new random variable.

Rich Ulrich

unread,

Oct 14, 2021, 2:00:06 PM10/14/21

to

On Tue, 12 Oct 2021 11:51:15 -0700 (PDT), Cosine <ase...@gmail.com>
wrote:

>Rich Ulrich ? 2021?10?13? ?????12:49:13 [UTC+8] ??????

Okay, "directly" meant "with no test".

Do keep in mind my warning,

> Then you have to argue
> that your "significant" effect is large enough that it would be robust
> against the likely or possible /confounding/ differences between
> samples.
>

--
Rich Ulrich