I don't see much discussion in the original papers other than the
feeling that a couple of hundred comparisons is quite healthy.
Looking at work published by others I rarely see any justification for
sample size - sometimes they power on the basis of a correlation
coefficient (!) and other times the authors refer to a 'convenience
sample', which is a little too cute for me at this time.
In my case we would not be able to do more than 50 comparisons a year
- the events (neonatal problems) are quite rare even in a specialist
unit.
I've found a letter (Stockl et al DOI: 10.1373/clinchem.2004.036095)
that seems to tackle the issue of confidence intervals but I'm way out
of my statistical confort zone here.
Best wishes to all
kuzmastabi
You do not need to test whether two methods are different --- they ARE.
The actual difference determines the sample size you need to get a
significant test, but this is rather beside the point.
The Bland&Altman 1999 paper I mentined is:
@Article{Bland.1999,
author = {Bland, J.M. and Altman, D.G.},
title = {Measuring agreement in method comparison studies.},
journal = {Statistical Methods in Medical Research},
year = {1999},
volume = {8},
pages = {136--160},
}
Best wishes,
Bendix Carstensen
______________________________________________
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
+45 44 43 73 13 (fax)
b...@steno.dk http://www.biostat.ku.dk/~bxc
______________________________________________
Martim
--
***************************************************
J. Martin Bland
Prof. of Health Statistics
Dept. of Health Sciences
Seebohm Rowntree Building Area 2
University of York
Heslington
York YO10 5DD
Email: mb...@york.ac.uk
Phone: 01904 321334
Fax: 01904 321382
Web site: http://martinbland.co.uk/
***************************************************
>Bland-Altman analysis is not a testing problem, hence power is
>irrelvant. It is an ESTIMATION problem;
Whilst it is literally true that power (as normally defined) is
'irrelevant', I would suggest that a directly analogous concept, and
certainly the estimation of required/desirable sample size, is as relevant
to estimation as it is to inference.
When estimating, one will generally be aiming to obtain a particular degree
of precision (e.g. width of confidence interval) of ones estimate, and the
value of that desired 'degree of precision' (as well as the variability of
the data) is what dictates the required sample size.
I would suggest that in many situations, one can get an estimate of the
appropriate sample size by considering the corresponding inference
situation. In other words (thinking aloud!), if one wishes to produce,
say, an estimate of a difference with a probability of (1-beta) that the
95% CI of that difference would be narrower than a certain value, then it
would probably be appropriate to use the same sample size that one would
estimate was necessary to detect that 'certain value of difference' as
significant with alpha = 0.05 and power of (1-beta).
That's how I see it, anyway.
Kind Regards,
John
----------------------------------------------------------------
Dr John Whittington, Voice: +44 (0) 1296 730225
Mediscience Services Fax: +44 (0) 1296 738893
Twyford Manor, Twyford, E-mail: Joh...@mediscience.co.uk
Buckingham MK18 4EL, UK
----------------------------------------------------------------
Fair enough! What John W is hovering around here is the "formal"
fact that there is a one-to-one correspondence between hypothesis
tests and confidence intervals, at any rate for "parametric" tests
(i.e. testing a Null Hypothesis that parameter theta = theta0, the
"null value") where there is a range of testable parameter values.
Namely:
Let T be a test statistic, i.e. something calculated in a particular
way from the data. When H0: theta = theta0 is true, this has a
particular distribution (the "null distribution"). Certain sets of
values of T are viewed as "extreme" in the sense of indicating
incompatibility between data and hypothesis. For a chosen small
probability alpha ("size of test") designate as "critical set"
the most extreme set whose total probability when Ho is true
is alpha. Then, if the outcome (value of T calculated from the
data to hand) falls in the critical set, reject the Null Hyptohesis.
Starting from any test procedure compatible with the above, the
set of all possible values of theta which, if adopted in turn as
"theta0" in a test based on the statistic T, are not rejected
by the corresponding test at size alpha, is a confidence interval
(or, more precisely, a confidence set, since in some applications
it may not be an interval) for the value of the paramater theta,
and its confidence level is P = 1-alpha. This is the case because,
whatever value of theta happens to be the true value, this procedure
generates a set of theta-values which has probability alpha of NOT
including the true value, hence probability P = 1-alpha of INCLUDING
it, regardless of what the true value is; and that's a confidence
set by definition.
Conversely:
Given any procedure for calculating a confidence interval (or set)
for a paramater theta -- by definition, whatever the true value
may be, the probability that it includes this true value is the
assigned confidence level P, then the test procedure which consists
of "for given theta0, reject H0: theta = theta0 if theta0 is not
in the confidence interval (set)" is a test of the Null Hypothesis
theta = theta0 which has probability alpha = 1 - P, whatever the
value of theta0 may be.
It follows that any consderations of practical importance which
are relevant to hypothesis tests are equally relevant to confidence
intervals, and vice versa.
In practice, confidence intervals are very often used to generate
tests of Null Hypotheses (e.g. "Do the confidence intervals for
the means of the two groups overlap, or not?", or "Does the
confidence interval for the odds ratio include the value 1?"),
but this is usually done retrospectively, taking for granted
whatever results one has to hand. On the other hand, test procedures
tend to be determined beforehand, often in terms of power calculations.
So there tends to be an apparent asymmetry in practice.
But, just as the sample size required for a test of size alpha
to be (with probability 1 - beta) able to detect a difference
(between true theta and null-hypothesis theta0) of at least a
given amount ("minimum clinically interesting difference")
is important, so also is the exact analogue in confidence
intervals, namely the sample size required to have probability
1-beta that the width of the P% confidence interval will be at
most a given amount (maximum acceptable "error margin").
Which comes back to the point John is making.
Note, however, that while the above correspondence is clean
and definite in the circumstances stated, it may be less to
or even quite indefinite in others. For example, non-parametric
tests tend not to correspond to a definite parametrisation
in terms of a paramater for which one might derive a confidence
interval by anymeans.
E.g. (think of discussion we had a while back), in a Mann-Whitney
U test for difference between two samples X and Y, the expected
vaue of U depends only on the probability that a random X is
less than a random Y, but the actual distribution of U depends
on other features of the two distributions, so there is not a
paramatrisation of the U distribution which one could use to
generate confidence intervals, except under more restrictive
assumptions.
On the other hand, say given a sample X, one can make a
"non-parametric" confidence interval for say the median of X
(or any other percentile of the distribution) in terms of the
quantiles of the sample, and from this one could generate a
test for whether the median has a particular value.
Best wishes to all,
Ted.
--------------------------------------------------------------------
E-Mail: (Ted Harding) <ted.h...@nessie.mcc.ac.uk>
Fax-to-email: +44 (0)870 094 0861
Date: 30-Apr-07 Time: 13:18:03
------------------------------ XFMail ------------------------------
>Fair enough! What John W is hovering around here is the "formal"
>fact that there is a one-to-one correspondence between hypothesis
>tests and confidence intervals, at any rate for "parametric" tests
>(i.e. testing a Null Hypothesis that parameter theta = theta0, the
>"null value") where there is a range of testable parameter values.
Indeed so. I actually thought I had written that qualifier (that I was
talking about 'poarametric' situation) in my posting (it was certainly in
my mind!) but, looking back at what I wrote, I see that I didn't! As Ted
says, the correspondence between estimation and inference is much more
complicated (and less clear-cut) when one moves into the 'non-parametric'
arena.
However, I would go as far as suggesting that, although the correspondence
is 'more complicated and less clear-cut' in the 'non-parametric' case, the
concepts still exist - i.e. that the probability of an estimate having a
desired degree of precision (c.f. 'power') will depend upon the sample size.
Power is the probability of rejecting the null hypothesis, which to me
seems to be a rather daft quantity.
The relvant quantity will always be the precision with which the
parameter of interest is determined.
In practical situations the latter is not very dependent on the size of
the parameter of interest, and that is exactly the point. Why should the
sample size to determine the effect (i.e. SIZE) of bloodpressure
lowering be dependent on the size of the anticipated effect. Whether it
is 5 mmHg og 40mmHg a precison of say \pm 5mmHg would be desirable,
because we will then be able to detect any worthwhile effect and be
precise about the size of it.
One may argue that if the effect is in fact large, it would be unethical
to enroll so many patients as to get the desired precision. But in the
current practise the "anticipated" effects that is operated in power
calculations are often unrealistically large in order to get the sample
size to fit in the budget, so the problem is more likely that patients
are enrolled in inconclusive studies.
When it comes to epidemiological studies where estimation is the core of
the business, the power concept is even more ridiculous.
It would be interesting if we could have God or some other guy with
similar gifts to produce a plot of anticipated effects versus actually
estimated effects for all studies (including the inconclusive ones that
are never published). Any guesses as to how it might look?
Best,
Bendix
______________________________________________
Bendix Carstensen
Senior Statistician
Steno Diabetes Center
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
+45 44 43 73 13 (fax)
b...@steno.dk http://www.biostat.ku.dk/~bxc
______________________________________________
> -----Original Message-----
> From: MedS...@googlegroups.com
> [mailto:MedS...@googlegroups.com] On Behalf Of
>Power is the probability of rejecting the null hypothesis, which to me
>seems to be a rather daft quantity.
>The relvant quantity will always be the precision with which the
>parameter of interest is determined.
>In practical situations the latter is not very dependent on the size of
>the parameter of interest, and that is exactly the point. Why should the
>sample size to determine the effect (i.e. SIZE) of bloodpressure
>lowering be dependent on the size of the anticipated effect. Whether it
>is 5 mmHg og 40mmHg a precison of say \pm 5mmHg would be desirable,
>because we will then be able to detect any worthwhile effect and be
>precise about the size of it.
Indeed so - and that is the heart of the argument (with which I have
considerable sympathy) against hypothesis tests.
However, I still contend that (certainly in relation to 'parametric'
situations), a mathematical analogy to power (namely 'the power' to achieve
the desired level of precision) DOES exist in relation to estimation - and,
as both Ted and I have said, that 'power' is mathematically more-or-less
identical to the ('real') power used in inference.