Calculating sample size with more than one variance.

99 views
Skip to first unread message

Başar Erdivanlı

unread,
Aug 11, 2014, 6:12:53 PM8/11/14
to piface-d...@googlegroups.com
Hi, I would like to calculate a sample size for a study, where I'm going to compare mean blood pressures.
I planned two groups:

Control group will receive drug A, high dose of drug B, drug C, drug D.
Study group will receive drug A, low dose of drug B, drug E.

I would like to see if the two combinations will make a difference in blood pressure or are equally effective.
**To make things clear, drugs A and B are used to put the volunteers into sleep.
**Drug C is used to slow the heart and drop the blood pressure.
**Drugs D and E are pain killers (low and high efficiency, respectively).
I tried these combinations in 20 volunteers (equally allocated) and have the means and standard deviations

SD was ~4 in the first group and ~10 in the second group.
An alpha error of 0.05 yields a power of 58% with a true difference of means being only 8 mmHg.

My questions are:

1. Is %58 power enough for a pilot study? (I need 12 additional volunteers for a power of %80, and 32 for %90)

2. Shall I instead try to get the variance introduced by low and high doses of drug B, and of drugs C and D and calculate a combined SD? (by the way, I don't know how to do this)

Lenth, Russell V

unread,
Aug 11, 2014, 8:30:07 PM8/11/14
to <berdivanli@gmail.com>, piface-d...@googlegroups.com
As I understand it, you are taking one measurement on each patient -- so the fact the details of which drugs just boil down to two treatments, defined as the combinations of drugs you stated. The fact that the SDs are so different is of some concern. One thing I wonder is if the mean of the second group is also a lot higher. If so, I t might be that an an analysis of a transformed response would bring the SDs more in line, and allow the use of the pooled t test.

Actually, I don't believe the SDs could possibly be that low. Blood pressures are pretty noisy measurements. I think what you may be reporting are the standard errors of the means, not the SDs, in which case the calculations are wrong and the correct results are way, way less significant.

I think you probably have enough from your pilot study to do a reasonable sample-size calculation. However, I caution you to decide the target effect size based on what kind of difference is important medically - not on what you observed in the pilot study. Is 10mmHg considered an important difference? How about 5mmHg? Based on medical criteria, decide what difference to target using the upper limit on negligibility and the lower limit of importance, and use that for the target. Determine the sample size that would yield a good power - 80 or 90 percent - for that effect size. Then you'll have a good chance of catching an important difference, if it exists, while not wasting resources on getting enough data to detect an unimportant difference.

I don't understand the last question, but if you keep the group sizes equal, the unequal SDs don't present much of a problem. The pooled and Welch t statistics will be equal in that case, only the d.f. will be less for the Welch test making it slightly less powerful.

If it is possible to test each patient with both drugs, with a suitable washout period in between, you could potentially get much better resolution in estimating changes, using a paired t test. It is likely you would need fewer subjects because a lot of the variation in blood pressure is between-subjects.

As always, I offer the suggestion of contacting a statistician nearby (e.g. In a university statistics or biostatistics department) who can sit down and talk with you about your experiment, and ask you a lot of questions so that s/he is sure you get everything right. This is NOT easy stuff, there is a lot more to it than just juggling numbers.

Russ

Sent from my iPad
--
You received this message because you are subscribed to the Google Groups "PiFace discussion" group.
To unsubscribe from this group and stop receiving emails from it, send an email to piface-discuss...@googlegroups.com.
To post to this group, send email to piface-d...@googlegroups.com.
Visit this group at http://groups.google.com/group/piface-discussion.
To view this discussion on the web visit https://groups.google.com/d/msgid/piface-discussion/3cd0c61f-e930-43b6-8d34-082478e050ec%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Başar Erdivanlı

unread,
Aug 12, 2014, 12:42:42 PM8/12/14
to piface-d...@googlegroups.com, berdi...@gmail.com, russel...@uiowa.edu


Dear Professor Lenth,

Thank you very much for your response.

As you guessed, the mean of the second group is higher. My R skills are not enough to graph the means of systolic and diatolic blood pressures of the groups as I'd like to. Therefore I made a picture and attached it.
The fact is, both drug combinations are used during surgery and are satisfactory.

After the usual anesthetics, one group receives an infusion of a very-short acting opioid, which relieves pain, slows the heart and drops the blood pressure by dilating the blood vessels. It is very titratable and its effect goes away within 10 minutes after I stop the infusion. At times, when the blood pressure drops too low, this is a very handy feature.
The other group receives a mild opioid pain killer and a second drug which slows the heart and drops the blood pressure by making the heart contract with less force (instead of dilating the blood vessels). Both these drugs are given by a bolus dose, and take slightly longer to affect the blood pressure (just 10-15 minutes more) and have longer half-lifes (8-12 hours). Therefore, their effect lasts for 5-6 hours after the patient wakes up. This is a great advantage.

The SDs are correct. The group which receives the infusion (pink area) has very stable blood pressures like 91/54, 89/52, 90/53 mmHg, while the other group (green area) has slightly higher blood pressures with slightly more variation like 95/56, 91/53, 96/48, etc.
The surgeons and I are happy with both combinations. But I have to admit that everyone (surgeon or anesthesiologist) would prefer the infusion, which has both a very short time to effect, and a short half life, is titratable according to response, albeit more costly.

My original intent is to see whether they are equally effective in lowering the blood pressure. Of course, since the only aim to drop blood pressure is to decrease bleeding and provide a clear field to the surgeon, other measures are taken into account: total time spent/lost for aspirating the blood in the surgical field, post-operative bleeding, post-operativ nausea due to ingested blood, etc. I surely will discuss these matters and the advantages and disadvantages of both combinations like costs, conveniences and unique advantages.

As for the blood pressure, however, a difference less than 10 mmHg is of no significance. The rationale behind this is that the infusion is able to provide stable mean blood pressures of 50 mmHg. And the surgeons love that drug. Even if the blood pressure or bleeding is higher than expected, the mere sight of the drug is enough to make them happy. If the mean blood pressure in the other group does not exceed 60 mmHg, the surgeons tolerate that combination very well. This is a somewhat psychologic barrier and resembles the established convention regarding how much data is "enough", which you mentioned in chapter 4 (What to do if you have no choice about sample size) in your paper "Some practical guidelines for effective sample size determination".

My last question (combined SD) was about chapter 3 of your paper (Finding the right variance). If I understood correctly, there you mention that we have to be careful about possible sources of variation such as sex, age, risk factors, etc.
Therefore, I wonder if we shall find the separate SD for each drugs effect on blood pressure, surgical aggressiveness of each surgeon, etc and combine them together in some way to find a single SD to use in sample size calculation. I would be very happy if you would comment on this, since I am very confused about it. I mean, it is almost impossible to limit the source of variation to just a single parameter in human studies. So what should we do, if we cannot take the paired t test approach (which is very wise indeed).

Basar
bp.gif

Lenth, Russell V

unread,
Aug 12, 2014, 4:28:47 PM8/12/14
to Başar Erdivanlı, piface-d...@googlegroups.com

Answering the last question first, the concerns about combining SDs come about when you are using a study of a different design to estimate the SDs needed fo0r the sample-size calculation. In your case, the pilot study is just like the one you're planning, so already the different contributors to variation are present in the right balance.

 

I guess I understand that the SDs are small because both treatments do well in controlling diastolic blood pressure and that you apparently have a target value that you are trying to bring everybody to. In fact, there are elements of what you say that suggest that you can basically control the mean, and the question is how well you can do it. If that's the case, a statistical comparison of the means is kind of meaningless because you are basically choosing it. The bigger story might be in the variation - less variation means better control. But maybe the comparison of means is the right thing to do. And the picture leads me to ask how many times you are measuring each patient - and if more than once, then at what times and what do you do with the results (average them together?) I guess if the goal is to compare means, and you have one measurement, or one summary statistic, per patient, then you can just use the t methods with the SDs you've observed and a target effect size of 10, and you'll be in fairly good shape.

 

Russ

Başar Erdivanlı

unread,
Aug 16, 2014, 6:35:45 PM8/16/14
to piface-d...@googlegroups.com, berdi...@gmail.com, russel...@uiowa.edu
Dear Professor Lenth,

Your suggestion of comparing the variation worked marvelously.I measured those blood pressures every 3 minutes, and  I noticed that the variation was almost always the same except at two time-points. And the increase in the variation is very rational. I don't want to disturb you with details, but one time-point is just after the injection of a local anesthetic containing adrenaline (adrenaline is frequently mixed with local anesthetics to cause collapse of the adjacent blood vessels. This way, the local anesthetic injected into the skin stays there. But, the same adrenaline may increase the heart rate and blood pressure, if it gets mixed into the blood).

I did the comparison by applying var.test to each minute (thanks to functional aspect of R). But I suppose that I should compare the means with one-way repeated measures anove. I spent the last few days refreshing my knowledge about this test, but couldn't decide. As far as I learned in the courses and forums, one should always start with the basic statistical tests. And only if these basic tests yield statistical difference, should one proceed with more complex statistical tests.

I know that repeated measures anova is used to compare changes in mean scores over three or more time points. ANOVA stands for analysis of variance, and ANOVA compares the amount of systematic variation. However, as I understood it, anova just allows us the bias caused by the variability due to subjects. So I am puzzled a bit.

When I did the var.test, I passed the averaged mean blood pressures of each group at a specific minute. Does repeated measures anova provide me the same thing, extended across the measurements? Or does it just compare the means? I always check a histogram or boxplot whenever suitable before seeing the p-value. And I always see that, t test or anova yields a significant p-value if the range of two groups does not cross each other. So I guess, all anova is doing is comparing the mean and sd. If one group's variation is very big and covers the whole plot of the other group, even if the other group's  mean and variation are very different, the p-value is not significant.

Finally, I am sorry if all these questions are so fundamental that I should have known them by now. I finished many statistics courses from Coursera, and benefited a lot from forums like Crossvalidated. But the deeper I dig, the harder it gets to comprehend the answers.

Basar
Reply all
Reply to author
Forward
0 new messages