Power Analysis for Secondary Data Analysis without Primary Data on hand

932 views
Skip to first unread message

Mehvish

unread,
Apr 26, 2011, 3:24:26 AM4/26/11
to MedStats, nilofe...@hotmail.com

Helllo!

Kindly, help me on the calculation of power analysis with following
situation.

Actually, I have to calculate power of the test for the analysis of a
secondary data using the published findings of primary data. However,
the findings of primary data does not meet with the objectives of
secondary data.

What I know about the calculation of power analysis that it primarily
requires the sample size and effect size with respect to the
hypothesis stated. Hypothesis made by the objective said.

Please, tell me, how can I calculate the power of the test in such
situation when the findings of primary data does not meet with the
objectives of secondary analysis?

It's urgent please....

Regards,

Mehwish Hussain

BXC (Bendix Carstensen)

unread,
Apr 26, 2011, 3:43:13 AM4/26/11
to meds...@googlegroups.com
Melvish,
I think you are much too vague here to get a useful reply.

In general, if you want to compute power of a test, you need to supply sufficient information to be able to simulate a dataset like the one you will obtain from your experiment.

So the simple solution to the power calculation is to simulate say 1000 datasets as the one you anticipate, do the analysis you anticipate, and then see how large a fraction of the tests that are significant. This also has the advantage that you will be able to take the entire procedure you intend to do into account (such as variable selection etc.). The further advantage is that you will already have done the programming needed to do your analysis.

The disadvantage is that you will see all your assumptions painfully explicit.

Best regards,
Bendix
_________________________________________

Bendix Carstensen
Senior Statistician
Steno Diabetes Center A/S
Niels Steensens Vej 2-4
DK-2820 Gentofte
Denmark
+45 44 43 87 38 (direct)
+45 30 75 87 38 (mobile)
b...@steno.dk www.biostat.ku.dk/~bxc
www.steno.dk

> --
> To post a new thread to MedStats, send email to
> MedS...@googlegroups.com .
> MedStats' home page is http://groups.google.com/group/MedStats .
> Rules: http://groups.google.com/group/MedStats/web/medstats-rules
>

Mehwish Hussain

unread,
Apr 26, 2011, 6:41:43 AM4/26/11
to meds...@googlegroups.com
Sorry, Bendix,
 
Let me explain you the situation, I am bearing.
 
Actually, a PhD candidate came to me for the analysis of a data. She is a nutritionist and she has to submit her synopsis within this week.
 
Her major objective is to find the diatery pattern of older adults (age >=40) in a low income urban country. Then, she will find the association of different factors like age, sex, systolic and diastolic blood pressures, life style factors, physical activity etc with those diatery patterns.
 
I told her earlier about the corresponding statistical techniques applicable for each objective like t test, correlation, regression etc.
 
The organization from where she will take the data said her that they will provide her primary data if she will be able to tell them the power of the test for each objective.
 
Then, she came to me to calculate the power of the test with each objectives.
She only knows that sample size was 1314 for the primary data.
 
I asked her to provide me any statistic so that I could calculate the power. She sent me 3 major publications made by using primary data and said to me that look in the statistic by yourself from the articles.
 
Now, I am much confused because the objectives of those publications don't meet with the objective of her PhD objective (mentioned above).
 
Though the means of systolic and diastolic BP, age etc are given but there is no association mentioned with diatery patterns.
 
Bendix! the idea of simulation is very good but she is pressurizing me to do this by today or tomorrow.
 
I am unable to do simulation with this short lag of time due to my full time job and studies.
 
But, I have to do this power analysis according to her someone said to her STATISTICIAN CAN DO ANYTHING. :(
--
Regards

Mehwish Hussain

Thompson,Paul

unread,
Apr 26, 2011, 8:48:12 AM4/26/11
to meds...@googlegroups.com
In my opinion, it is the responsibility of the PI (in this case, the nutritionist) to define the problem.  For a power analysis, that involves a) means or parameters that you EXPECT/BELIEVE will come from the experiment b) measures of variability.  At that point, you can define a power analysis.

While she may believe that the parameters are in these articles, she must wish to do something else, because just redoing the exact work of the articles is seldom the objective.

Request that she define the problem more clearly.  The deadline is not your problem.

From: meds...@googlegroups.com [meds...@googlegroups.com] on behalf of Mehwish Hussain [mehvish....@gmail.com]
Sent: Tuesday, April 26, 2011 5:41 AM
To: meds...@googlegroups.com
Subject: Re: {MEDSTATS} Power Analysis for Secondary Data Analysis without Primary Data on hand



-----------------------------------------------------------------------
Please note that My Email Address Has Changed!
Please begin using the address in the "From" line above, immediately.
Soon, email sent to my old address will no longer be delivered to me.

Confidentiality Notice: This e-mail message, including any attachments,
is for the sole use of the intended recipient(s) and may contain
privileged and confidential information. Any unauthorized review, use,
disclosure or distribution is prohibited. If you are not the intended
recipient, please contact the sender by reply e-mail and destroy
all copies of the original message.

Mitchell Maltenfort

unread,
Apr 26, 2011, 9:01:41 AM4/26/11
to meds...@googlegroups.com
>
> But, I have to do this power analysis according to her someone said to her
> STATISTICIAN CAN DO ANYTHING. :(

This is why Superman keeps a secret identity.


More usefully, check
http://www.psycho.uni-duesseldorf.de/abteilungen/aap/gpower3 -- it's a
tool for power analyses, free, and it gives you the sensitivity for
each test for a given N, type I and type II error.

You can give hissy missy a power analysis for each relevant test in a
handful of minutes.

She can chew her own food from there.

BXC (Bendix Carstensen)

unread,
Apr 26, 2011, 9:19:27 AM4/26/11
to meds...@googlegroups.com
This seems to me the way to get it wrong.
If there is no precise specification of the problem, you are giving an answer to an irrelevant question.
And corroborating the notation that statisticians can do anything, in this case dreaming up data where there is none available.

If you have a precise specification of the problem it will almost invariably not fit in the power calculators, therefore, simulate. This also has the desirable side-effect of letting you know the likely precision of your estimates, which are the relevant quantities.

Otherwise you might as well as be honest and admit that the sample size (as in 53.8% of all applications) in reality is computed as the obtainable budget divided by the unit price.

Best regards,
Bendix

> -----Original Message-----
> From: meds...@googlegroups.com
> [mailto:meds...@googlegroups.com] On Behalf Of Mitchell Maltenfort
> Sent: 26. april 2011 15:02
> To: meds...@googlegroups.com
> Subject: Re: {MEDSTATS} Power Analysis for Secondary Data
> Analysis without Primary Data on hand
>
> >

Swank, Paul R

unread,
Apr 26, 2011, 10:45:56 AM4/26/11
to meds...@googlegroups.com

It seems to me that with the sample size and knowledge of the comparison to be made, it would be possible to determine the minimum effect size detectable for that test and sample size. SO, for gender, for example, if you know the distribution of gender from the primary data, then you should easily be able to compute the minimum effect size you could detect with that sample.

 

Dr. Paul R. Swank,

Professor and Director of Research

Children's Learning Institute

University of Texas Health Science Center-Houston

 

From: meds...@googlegroups.com [mailto:meds...@googlegroups.com] On Behalf Of Mehwish Hussain


Sent: Tuesday, April 26, 2011 5:42 AM
To: meds...@googlegroups.com

Steve Simon, P.Mean Consulting

unread,
Apr 26, 2011, 11:27:39 AM4/26/11
to meds...@googlegroups.com, Mehwish Hussain
On 4/26/2011 5:41 AM, Mehwish Hussain wrote:

> The organization from where she will take the data said her that
> they will provide her primary data if she will be able to tell them
> the power of the test for each objective. Then, she came to me to
> calculate the power of the test with each objectives. She only knows
> that sample size was 1314 for the primary data.

One could argue that a power calculation is irrelevant when the sample
size is fixed. But the organization is unlikely to buy this.

As I describe in
* http://www.pmean.com/01/power.html
you need three things for a power calculation. A research hypothesis, a
standard deviation of your outcome measure, and the minimum clinically
important difference.

The research hypothesis is easy in your case. The standard deviation is
harder because the papers that you have gotten, as I understand it, do
not have a standard deviation for the outcome she is interested in, but
rather for other outcome measures.

But it is a very rare outcome measure that has never had anything
published about it. Surely some paper somewhere can provide you with
that standard deviation. It's not like she's inventing totally new
outcome measures that have never been studied by anyone before.

The population that these outcome measures were used on, of course, are
probably quite different than the one she is proposing to use. That's
always going to be an issue. Try to find a population that is not too
radically different, but keep in mind that this will always be an
imperfect fit.

As a worst case scenario, you can use a SWAG (look it up). If you know
the range of your data, that can give you a rough idea of how big your
standard deviation might be. It's impossible for a standard deviation to
be 500, for example, if your data lies between 0 and 10.

Here's an example. Your outcome measure is birthweight. The tiniest
babies are about 500 grams, anything smaller is not viable. The biggest
babies are about 5000 grams, as human females are not big enough to give
birth to babies much larger than this. The range is 4500 grams. Divide
by 4 or 6 to get an approximate standard deviation. Here it would be 750
or 1125. Now your situation might produce a smaller standard deviation
(maybe much much smaller), but it would be really hard to get a larger
standard deviation, because physiology places limits on the variability
of birthweights. What this means is that your estimate of sample size
might be a bit too big if you use a standard deviation of 750 or 1125.

Now I rarely use a SWAG, but keep in mind that here are not any serious
ethical problems with a sample size that is too big (see below). It's
not like you are asking a bunch of patients to undergo a needless
medical test or forcing half of your patients to forgo the active
medicine for a placebo. The data is already collected. So you can't
argue that it is unethical to have too large a sample size here.

The minimum clinically important difference here is actually not too
hard. You know that a t-test with 1,314 subjects can detect a pretty
small difference (about a tenth of a standard deviation, assuming equal
sample sizes in each group). So figure out whether a tenth of a standard
deviation is "small enough". It's probably too small, but there is no
problem here.

The only issue that might come up is if you have a binary outcome and
the event in question is extremely rare. There is a rule of 50 that says
that if you are comparing the probability of an event in two groups, you
want to have about 25 to 50 events in each group. You should be safe if
all your rates are 5% or more.
* http://www.pmean.com/01/quick.html

Seriously imbalanced sample size in the treatment and control group (as
could happen if you are looking at a rare subgroup) will complicate
things. You lose a lot of power when the data is divided more extremely
than an 80-20 split.

More complicated data analyses, such as ANOVA or regression are unlikely
to be a problem though.

Steve Simon, n...@pmean.com, Standard Disclaimer.
Sign up for the Monthly Mean, the newsletter that
dares to call itself average at www.pmean.com/news

Mitchell Maltenfort

unread,
Apr 26, 2011, 11:50:19 AM4/26/11
to meds...@googlegroups.com
On Tue, Apr 26, 2011 at 9:19 AM, BXC (Bendix Carstensen) <b...@steno.dk> wrote:
> This seems to me the way to get it wrong.

I stand by the classic formula of "you can get it good, fast or cheap.
Pick two."


The nutritionist seems to have made her choice.

Peter Flom

unread,
Apr 26, 2011, 11:53:53 AM4/26/11
to meds...@googlegroups.com
Mitchell Maltenfort wrote

<<<
I stand by the classic formula of "you can get it good, fast or cheap.
Pick two."
>>>

Two? Most times, one is a stretch. :-)

Peter

Mehwish Hussain

unread,
Apr 26, 2011, 12:02:18 PM4/26/11
to meds...@googlegroups.com

Thank you so much for your kind help! it is really appreciable.

 

Dr. Thompson! by your post, she was agreed to search for statistics from her literature but still I have to indicate the required statistic(s).

 

Dr. Mitchell! Indeed, I am using the same software for power analysis. I told her that meet me on friday, I will do the power analysis in front of her so that she may believe that by only N (the sample size) we can not calculate the power of the test.

 

Sir Bendix! I did not get your last lines of calculating the sample size. Probably, I wudn't know about the method you told.

 

Dr. Swank! how could I be able to detect the minimum effect size by the gender distribution? I can tell you there were 551 male in the data, rest were females.

Swank, Paul R

unread,
Apr 26, 2011, 12:38:27 PM4/26/11
to meds...@googlegroups.com

By simulating two groups with ns of 763 and 515, and varying the effect size from 0 to .3 in steps of .001, I find that with an effect size of .14 standard deviations I have .80453 power to detect such an effect with a one tailed test and .81590 power to detect an effect of .16 standard deviations for a two tailed test. These would be small effects in almost everyone’s book. For example, in my line of work we find typical effect sizes for preschool interventions to be around .30 standard deviations.

Mehwish Hussain

unread,
Apr 26, 2011, 12:43:08 PM4/26/11
to meds...@googlegroups.com
 
Professor Mean!
 
Thank you so much. Your provided links are really helpful. Your suggestions are also great!!!
 
I will surely follow this for other researches and will be referred in my lectures too.
 
However, I think, I will not be able to adopt thie strategy for her research and objective because she ask verification or reference for each and every steps of the analysis. If I will do by your defined calculation of standard deviation. She will not believe my calculation and I have to re-analyze again after searching literature review etc.
 
Well, she is now agree for doing literature review. I am just afraid on the matter, if the said statistic will not be found in her literature review then again I will have to call all of you for the help. :(
 
May God Help me!!!!
--
Regards

Mehwish Hussain

Reply all
Reply to author
Forward
0 new messages