Account Options

  1. Sign in
The old Google Groups will be going away soon, but your browser is incompatible with the new version.
Google Groups Home
« Groups Home
Will Reproducibility Project unearth "an excess of significant findings"?
There are currently too many topics in this group that display first. To make this topic appear first, remove this option from another topic.
There was an error processing your request. Please try again.
flag
  12 messages - Collapse all  -  Translate all to Translated (View all originals)
The group you are posting to is a Usenet group. Messages posted to this group will make your email address visible to anyone on the Internet.
Your reply message has not been sent.
Your post was successful
 
From:
To:
Cc:
Followup To:
Add Cc | Add Followup-to | Edit Subject
Subject:
Validation:
For verification purposes please type the characters you see in the picture below or the numbers you hear by clicking the accessibility icon. Listen and type the numbers you hear
 
Roger Giner-Sorolla  
View profile  
 More options May 21 2012, 9:03 am
From: Roger Giner-Sorolla <rogersebast...@gmail.com>
Date: Mon, 21 May 2012 06:03:42 -0700 (PDT)
Local: Mon, May 21 2012 9:03 am
Subject: Will Reproducibility Project unearth "an excess of significant findings"?

Recently, Gregory Francis has had at least two papers applying the methods
of Ioannidis & Trikalinos to articles in psychology: one looking at Bem's
2011 JPSP precognition article

Francis, G. (2012). Too good to be true: Publication bias in two prominent
studies from experimental psychology. *Psychonomic Bulletin & Review*, *19*,
1–6.

and another looking at the closeness of desirable objects effect from
Balcetis and Dunning

Francis, G. (2012). The same old New Look: Publication bias in a study of
wishful seeing. *i-Perception*, *3*(3), 176–178.

which elicited a reply and an exchange from the authors (linked from
http://i-perception.perceptionweb.com/journal/I/volume/3/article/i0519ic)

In theory this could go on forever (a search reveals he has another one in
press at JEP:General) and of course the "hit list" approach doesn't leave
us with very firm grounds for discipline-wide generalizations about
false-positive bias. My personal reaction is, why go after results one at a
time if this so obviously reflects an endemic practice in the field?

Indeed, it occurred to me that we could do a lot better by looking at our
sample of the psychology field from the Reproducibility Project. We would
have to do power analyses for all studies in each articles, of course, but
we already have the framework in place. Although I have my own archival
project I want to start this summer, I'm wondering if anyone could take on
organizing this project, or at least give their opinion as to whether it's
worth doing?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roger Giner-Sorolla  
View profile  
 More options May 21 2012, 9:17 am
From: Roger Giner-Sorolla <rogersebast...@gmail.com>
Date: Mon, 21 May 2012 06:17:06 -0700 (PDT)
Local: Mon, May 21 2012 9:17 am
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?

Oh, and some important context: Ioannidis & Trikalinos (2007)

Ioannidis, J. P. A., & Trikalinos, T. A. (2007). An exploratory test for an
excess of significant findings. *Clinical Trials*, *4*(3), 245–253.

note that for a multi-study article,  if all results reported are
significant, the likelihood can be calculated that a complete report of all
studies done would have come up with a similar result, given the studies'
experimental power. For example, if 5 studies are run a priori, each at .80
power to detect the effect size eventually found, then even if the
aggregate sample effect size is a true estimate of the population effect
size, it is likely that at least one study will come out with p > .05, and
only a ~33% chance that they will all be significant.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joachim Vandekerckhove  
View profile  
 More options May 21 2012, 5:58 pm
From: Joachim Vandekerckhove <joachim.vandekerckh...@gmail.com>
Date: Mon, 21 May 2012 14:58:57 -0700 (PDT)
Local: Mon, May 21 2012 5:58 pm
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?

Hi Roger,

I haven't been active in this group, but I am doing exactly this right now.
An RA is collecting all the relevant statistics now and we plan to write a
report by the end of the summer. I'm also meeting Greg soon to discuss the
implementation.
I think it would be a worthwhile effort not only in order to provide an
overview of observed power (post hoc) in the sample, but also to provide
power-based advice to groups aiming to replicate the studies. I'm happy to
communicate more on this if there is an interest.

Cheers,
Joachim


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Roger Giner-Sorolla  
View profile  
 More options May 22 2012, 5:29 am
From: Roger Giner-Sorolla <rogersebast...@gmail.com>
Date: Tue, 22 May 2012 02:29:59 -0700 (PDT)
Local: Tues, May 22 2012 5:29 am
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?

That's great, Joachim. Are you using the Reproducibility Project sample of
articles specifically?

And some power-related questions: what are you doing for studies that
report no effect size (all too common in Psych Science), and repeated
measures studies that report no correlation among DVs, which is needed for
ES of the repeated measures ANOVA effects?


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gregory Francis  
View profile  
 More options May 22 2012, 8:13 am
From: Gregory Francis <gfrancissw...@gmail.com>
Date: Tue, 22 May 2012 05:13:20 -0700 (PDT)
Local: Tues, May 22 2012 8:13 am
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?
The main reason to apply the technique to individual findings is that
some scientists care about those individual findings. People who care
about these phenomenon should know when the reported studies do not
provide proper evidence for the stated claims.

By the way, I have a letter that just appeared in PNAS

http://www.pnas.org/content/early/recent

The authors' reply strikes me as mostly nonsense, but you can judge
for yourself.

There is already pretty convincing evidence that this kind of bias
exists across the field, but this kind of general characterization
does not seem to have had much impact (not much has changed from
Sterling's observations in the 1950s). Perhaps a more "personal"
approach will be more effective.

By the way, I do not have a "hit list", nor do I think the authors of
the work I have criticized are behaving worse than most researchers.
The problems are difficult and systemic. Most of us are running and
reporting experimental findings incorrectly.

I think someone should apply the approach to the articles in the
Reproducibility Project. If the reported findings are unbelievable,
then I think there is no reason to do the replication (unless you
happen to care about the topic). It's more than one person can do on
their own, but I would be happy to help.

-Greg Francis

On May 21, 9:03 am, Roger Giner-Sorolla <rogersebast...@gmail.com>
wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Brian Nosek  
View profile  
 More options May 22 2012, 12:34 pm
From: Brian Nosek <no...@virginia.edu>
Date: Tue, 22 May 2012 09:34:43 -0700
Local: Tues, May 22 2012 12:34 pm
Subject: Re: [OpenScienceFramework] Re: Will Reproducibility Project unearth "an excess of significant findings"?

I think we have a good opportunity to look not just at power, but many
features of "standard practice" with our 2008 study sample.  Elizabeth
Bartmess and a couple of others are finishing up a simple web form to help
study coders complete and submit information about each of the studies.
 This ought to help standardize the coding process and lower the bar to
making contributions on that coding.  So, many people could make small
contributions.

There is a possibility of expanding this coding project to facilitate power
investigations (like what Joachim is starting) and others - estimating
average effect size, sample size, types of study designs, distributions of
p-values, appropriateness of statistical tests, conducting of conceptual
and direct replications (the archival project idea that Roger is
initiating), etc.  That is, if we do a very good job on a comprehensive
coding of 2008 papers, then there might be many projects that could use
that same dataset.

*Current*: The current coding project is focused on coding a few features
of a single study from each 2008 article from three journals.  This coding
is focused entirely on supporting the Reproducibility Project goals.

*Proposal*: We can boost the power of all of the current archival projects
by collectively amassing a large dataset of study characteristics.  And,
the amassed dataset would allow additional OSC (or independent)
investigations.  How about we formally separate the coding project from the
Reproducibility Project and expand it to:

(a) coding every study in each article

(b) coding all major features of study design and reporting - sample size,
effect size, statistical tests, replication or not, exclusion criteria,
hypothesis supported or not, what key information reported and what is not,
etc. (many possibilities here, we'd need to meet to discuss what is
essential and how to improve coding from present approach)

(c) broaden the journal base so that there can be formal comparisons across
subdisciplines - e.g., Journal of Abnormal Psychology, Developmental
Psychology, Journal of Cognitive Neuroscience, even outside of
psychology/neuroscience if there are folks with relevant expertise and
interest in the OSC

On Tue, May 22, 2012 at 5:13 AM, Gregory Francis <gfrancissw...@gmail.com>
wrote:

fromhttp://i-perception.perceptionweb.com/journal/I/volume/3/article/i0519ic
)


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Ruben Arslan  
View profile  
 More options May 21 2012, 1:56 pm
From: Ruben Arslan <rubenars...@gmail.com>
Date: Mon, 21 May 2012 19:56:05 +0200
Local: Mon, May 21 2012 1:56 pm
Subject: Re: [OpenScienceFramework] Will Reproducibility Project unearth "an excess of significant findings"?

Hey all,

I've only lurked on this list before. I greatly enjoy reading about the progress, so
maybe I can contribute a little work myself now.

To me, this seems like an obvious target for crowdsourcing, because extracting the necessary
coefficients and making the analytic decisions (how to pool,...) could be
done quite easily by graduate students (maybe after some required reading).
It could also be done redundantly, so that we wouldn't have to worry too much about the
individuals' classifications.

I'd volunteer to do a simple web interface for entering the necessary information and possibly
sending out emails to resolve disagreement in the classifications, if the group
decides that it's a worthy effort. To me it seems interesting enough to offer to put in some work
and it would be nice to do it without putting too much emphasis on individual actions in a flawed system  :-)

Of course it may be overkill to do it as I suggested, but if it makes things any easier, I'd gladly do it.

Best wishes,
Ruben

--  
Ruben Arslan
Student assistant
Lab: http://www.psychology.hu-berlin.de/profship/perdev
Humboldt-University of Berlin
Unter den Linden 6
10099 Berlin, Germany

On 21.05.2012, at 15:03, Roger Giner-Sorolla wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Jesse Chandler  
View profile  
 More options May 22 2012, 5:47 pm
From: Jesse Chandler <j.j.b.chand...@gmail.com>
Date: Tue, 22 May 2012 14:47:26 -0700 (PDT)
Local: Tues, May 22 2012 5:47 pm
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?
Hi Greg,
I think it might actually make a stronger case if we try to reproduce
everything as planned, and then look at the relationship between power
and the probability of replication. I think we would find the
unsurprising result that underpowered findings do not replicate.

You might ask "well, why is this worth doing" Two reasons.- First,
part of this project was to look at the overall replicability of the
field, and it seems like including these low powered studies is an
important part of this the whole sampling strategy was designed around
picking a representative sample of what is published. Second, this
approach would allow us to address the argument that you seem to
encounter that "when one uses many different measures, cumulative
power tells us little." In principle, papers could be coded as a
series of direct or conceptual  replications, and their cumulative
power regressed onto the probability of successful replication for the
selected experiment. It may in fact be the case that it is a weaker
predictor for heterogeneous studies, but that is a different question
from whether this method overall is useful heuristic to assess the
believability of results.

On May 22, 8:13 am, Gregory Francis <gfrancissw...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Joachim Vandekerckhove  
View profile  
 More options May 22 2012, 6:21 pm
From: Joachim Vandekerckhove <joachim.vandekerckh...@gmail.com>
Date: Tue, 22 May 2012 15:21:05 -0700 (PDT)
Local: Tues, May 22 2012 6:21 pm
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?

Roger:

Yes, the idea was to use the exact same sample -- it seems like a nicely
unbiased sample from the literature.

Re: power: Those problems have yet to come up (since we're only just
starting this project), but the strategy I have in mind is to take the
following steps (in order): 1) try to use other reported statistics to
compute the ES and power (often possible with just t statistics, or with F
statistics if the means are reported as well); 2) contact authors to fill
in blanks; 3) use numerical methods to integrate out the unknown
parameter(s), and report best/worst cases. That last step would be
basically a Bayesian inference step with a prior over the unknown(s). I'd
also provide some code snippets that other people can use to supply their
own priors. There are definitely cases where some unknown varying over a
very reasonable range has only a marginal effect on the power estimate.


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Marcus Munafo  
View profile  
 More options May 24 2012, 4:48 am
From: Marcus Munafo <marcus.mun...@bristol.ac.uk>
Date: Thu, 24 May 2012 09:48:27 +0100
Local: Thurs, May 24 2012 4:48 am
Subject: Re: [OpenScienceFramework] Re: Will Reproducibility Project unearth "an excess of significant findings"?

The use of observed power may be problematic here, because there's no
independent confirmation that the effect size is accurate (it may be
over-estimated). Observed power is simply another way of representing
the  information contained in the observed effect size, sample size and
p-value for given study.

Ioannidis and Trikalinos use the effect size from a meta analysis to
anchor their power calculation for individual studies, not the observed
power for each study, on the assumption that the meta analysis effect
size estimate will be a more accurate estimate of any true population
effect.

In fact, in the presence of publication bias when an effect is not in
fact real, one would expect large studies to be the only ones showing
null results to actually get published. All the small studies showing
null (or opposite to predicted) results would be censored. This is the
rationale of funnel plot methods.

In fact, in the presence of publication bias when an effect is not in
fact real, one would expect large studies to be the only ones showing
null results to actually get published. All the small studies showing
null (or opposite to predicted) results would be censored. This is the
rationale of funnel plot methods.

So if publication bias is operating within a field we would probably
expect the studies with the *greatest* power to detect a given effect to
be the ones likely to show no effect. This would not be reflected in an
observed power calculation. An illustration of that can be found in this
recent study:

http://www.ncbi.nlm.nih.gov/pubmed/22488255

The Viviani study has 68% power to detect the effect size estimate
indicated by the meta-analysis, but only 5% observed power when the
effect size estimate from the Viviani study itself is used (because the
effect size observed in that study was almost zero, and the p-value 0.99).

The fact that the only adequately powered study in this literature
reported no effect (i.e., failed to replicate) is very important.
Studies which report no effect will have the lowest observed power by
definition, but it's often exactly these studies which tell us that
something untoward is going on.

Marcus.

On 22/05/2012 13:13, Gregory Francis wrote:

--

Marcus Munafò
Professor of Biological Psychology
School of Experimental Psychology
University of Bristol
12a Priory Road
BRISTOL BS8 1TU
United Kingdom

+44.117.9546841 t.
+44.117.9288588 f.

marcus.mun...@bristol.ac.uk

http://www.bris.ac.uk/expsych/people/academic/marcusmunafo.html
http://www.bris.ac.uk/expsych/research/brain/targ/


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gregory Francis  
View profile  
 More options May 25 2012, 10:57 am
From: Gregory Francis <gfrancissw...@gmail.com>
Date: Fri, 25 May 2012 07:57:34 -0700 (PDT)
Local: Fri, May 25 2012 10:57 am
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?
I generally agree with Marcus' descriptions of the relative
plausibility of finding evidence for publication bias based on
observed power versus power based on a pooling of effect sizes. If the
experiments are precise replications, then pooling the effect sizes is
definitely the way to go. However, I think there is still value to
using observed power, as long as one is careful.

For small experiment sets, the observed power analysis gives a huge
benefit of the doubt to the experiments, by supposing that the
reported effect size is valid (this is also true for the pooled effect
size analysis). If there is a bias, the reported effect size probably
grossly overestimates the true effect size. The result its that the
observed power is also an overestimate of true power. What this means
is that the analysis will miss many cases where bias does exist.  The
test is very conservative.

There is a different concern when using observed power (I discussed
this a bit in my rebuttal to the Piff reply; see my earlier comment
for the link). If there is no bias, and true power is bigger than 0.5,
then observed power tends to underestimate true power. This means that
a straight application of the approach will report bias where it does
not exist, and this problem gets excessively large as the number of
experiments under consideration increases. It is only a serious
problem for certain methods of choosing the sample size, but even a
worst case scenario (for the power analysis making a false positive
declaration of bias) needs to be considered. This is why I've always
run simulations to verify that the analysis was not likely to produce
a false positive for the experiments under consideration. For larger
experiment sets, one would need to do some kind of correction to try
to compensate for the worst case bias (I've not yet worked out exactly
how to do this).

-Greg

On May 24, 4:48 am, Marcus Munafo <marcus.mun...@bristol.ac.uk> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
Gregory Francis  
View profile  
 More options May 25 2012, 11:23 am
From: Gregory Francis <gfrancissw...@gmail.com>
Date: Fri, 25 May 2012 08:23:24 -0700 (PDT)
Local: Fri, May 25 2012 11:23 am
Subject: Re: Will Reproducibility Project unearth "an excess of significant findings"?
Jesse,

Certainly replication attempts will add some valuable data. I have to
confess that I am not clear exactly what outcome this project hopes to
reveal. I've gleaned a few possibilities, but maybe I misunderstand
some things.

1) Show that many reported phenomena in psychology do not replicate.
I'm pretty sure the project will be successful at this task. Indeed,
if it was not, I would charge bias in the replication attempts. Given
the power values of the original findings, a lot of experiments should
not replicate, regardless of whether the effect is real or not.

2) Show that across the field too many experiments do not replicate
(this seems to reflect the stated project goals). Establishing the
predicted number of replications would seem to require something like
the power analysis I have done, and this might be worthwhile. On the
other hand, I think the result will not be telling us anything new.
Sterling (1955) and Sterling et al. (1994) make a pretty good case for
this already. The problem is that those analyses do not indicate which
experimental findings are biased and which are not. There's plausible
denial for everyone, even though clearly a lot of findings must be
biased.

3) Show that a particular result does not replicate. This is not one
of the stated goals, and the project seems unsuited to do this
effectively. Bem's quote in the recent Nature article by Yong was
correct, a single failure to replicate is unlikely to settle the issue
about whether a finding is real or not. Indeed, a failure to replicate
can sometimes make a finding more believable (by avoiding the
appearance of bias). If the project motivates people to think about
statistics this way, then that would be a very good thing. However, I
suspect the initial reaction will be finger-pointing, accusations, and
denial.

I really do see some benefits to the reproducibility project, but I
fear that the findings will be misunderstood. On the other hand, given
the interest, there is a lot of opportunity for education.

Good luck,

-Greg

On May 22, 5:47 pm, Jesse Chandler <j.j.b.chand...@gmail.com> wrote:


 
You must Sign in before you can post messages.
To post a message you must first join this group.
Please update your nickname on the subscription settings page before posting.
You do not have the permission required to post.
End of messages
« Back to Discussions « Newer topic     Older topic »