The Greg Francis critiques

Skip to first unread message

Simonsohn, Uri

Jun 5, 2012, 3:39:37 PM6/5/12

Dear list,

A couple of weeks ago people were discussing the analyses that Greg Francis has been publishing testing for publication bias in individual papers, and the idea of using his implementation of the "excessive significance test" more broadly was floated around.


I took a close look at his papers and became worried that they may have some fairly serious problems. I have discussed this with Greg over the phone and email and we have not reached a consensual view on this issue (Greg and I do agree that the other is confused).


I have drafted a 5 page document that summarizes my concerns,  I am thinking of submitting it to a journal.


Greg suggested I get feedback from others before doing so. Given the obvious relevance of publication bias tests to this mailing list and the interest in Greg's analyses in particular I thought I would make the early drat of that short paper available to members of the list.  You can see it here


To avoid distracting the list I think the best way to convey any comments would be to send them to me directly. I will then post a single document with comments of people who actively indicate they would like their comments to be public. But this is an open forum so this is merely a suggestion.






Uri Simonsohn

Associate Professor

The Wharton School



Jelte M. Wicherts

Jun 6, 2012, 4:55:50 AM6/6/12
Dear Uri,

Your point about publication bias in studies of publication bias is well taken (see also: The test by Ioannidis and Trikalinos (2007) indeed only indicates that there is a problem in a given set of studies, but it does not indicate the severity of bias for the estimate of the mean effect size. So it does not warrant any statements as to whether there is a false positive or not. Hedges' (1992) selection model can be used to estimate the bias in the context of meta-analysis, but this method requires quite large sets of studies. 

Marjan Bakker and I just finished our contribution to Hal Pashler and EJ Wagenmaker's special issue for Perspectives in which we randomly sampled psychological meta-analysis published in 2011, selected the most homogeneous subset of at least 10 primary studies in each of those, and applied the Ioannidis & Trikalinos test as well as Egger's test for funnel plot asymmetry. We found problems in about half of the meta-analyses.

Marjan and I will probably share the PDF of our manuscript soon with the group. 

Best wishes,


2012/6/5 Simonsohn, Uri <>

Eric-Jan Wagenmakers

Jun 6, 2012, 7:14:43 AM6/6/12
Dear Uri,

If I read your paper correctly, you assume that there exist several
(many?) series of experiments that pass the Ioannidis and Trikalinos
test for publication bias. The idea is that Greg Francis chose to
focus on the studies that did not.

However, my intuition (and the fact that approximately 95% of
published articles confirm their main hypothesis, see Sterling's work)
suggests that it will be difficult to find studies that pass the
Ioannidis and Trikalinos test. My main problem with Greg Francis'
results is the opposite from yours: I believe it is rather clear that
if the field as a whole is tainted by publication bias, so will the
individual studies.

So perhaps your point has more force if you can point to specific,
more or less established phenomena where the test does *not* indicate
publication bias. I think this will not be as easy as you imply --
yes, there are many papers but Francis' analyses require more than
just two or three studies. Let's say we use a lower bound of six?

WinBUGS workshop in Amsterdam:
Eric-Jan Wagenmakers
Department of Psychological Methods, room 2.16
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands

Phone: (+31) 20 525 6420

“Man follows only phantoms.”
Pierre-Simon Laplace, last words

Joachim Vandekerckhove

Jun 7, 2012, 5:12:58 AM6/7/12

The lower bound for detecting publication bias in a series of studies, given current conventions, would be n = 4, I think. Assume a series of studies, all showing just-significance at p approximately equal to alpha. Each has a power of .5, and .5^4 is the lowest n for that to go below .1.


Your argument on publication bias in Francis' publication bias tests seems to assume that there are possible file drawer effects relevant to hypotheses of the type "the Balcetis and Dunning paper shows publication bias" (as this is the claim made in the i-Perception paper, for example). I don't think there is such a file drawer effect. In particular, no generalization claim is made.
Of course it is true that the Ioannidis-Trikalinos statistic is still a probability ("in the absence of publication bias, the probability of observing such a pattern is...), and even events will low probability happen sometimes, but that isn't really a secret. Certainly, it doesn't invalidate the analyses Francis reports?
Also, I haven't read all of the papers, but I didn't come across any recommendation that data be destroyed, merely that the conclusions (the inference) should be considered in the light of possible file-drawered data.


Just to be sure I understand your post: Is your main point that evidence for publication bias does not equal evidence for a false-positive finding?

Gregory Francis

Jun 7, 2012, 4:18:07 PM6/7/12
to Open Science Framework
Joachim and EJ,

Actually, two studies is enough, although it is a rather special
situation. Imagine there are two studies that are direct
methodological replications, but they differ widely in sample size and
effect size

For the first study,

n1=10, n2= 10, t = 2.11, p = 0.0491, g=0.904

For the second study

n1=100, n2=100, t=1.98, p = 0.0491, g=0.279

Since these are direct replications (same methodology) it is
appropriate to pool the effect sizes together (exact methodology
trumps the apparent differences in the effect sizes). The pooled
effect size is dominated by the larger samples and is


The power of each experiment to reject the null for this pooled effect
size is

For the first study: power = 0.109

For the second study: power=0.654

The product is 0.0715, which is small enough to indicate publication
bias. I've never seen a case exactly like this, but it could happen. I
have seen cases where three experiments lead to a conclusion of
publication bias, although one could argue that they were not _direct_
replications of the methodology.


On Jun 7, 5:12 am, Joachim Vandekerckhove
Message has been deleted
Message has been deleted

Gregory Francis

Jul 19, 2012, 3:44:25 PM7/19/12
I have had enough people contact me about Uri's manuscript that I decided to write up a formal rebuttal. A copy can be found at


Daniel Lakens

Jul 21, 2012, 2:18:30 AM7/21/12
Let's focus on moving the field forward as a group.

Daniel Lakens

Aug 3, 2012, 3:48:38 AM8/3/12
Hi Uri,

only 6 days after I made the previous comment in this thread, Greg Francis submitted a paper to Psychological Science concerning a publication I'm a co-author on. Since I have to comment on that manuscript in any case, I was wondering whether you are still interested in receiving these comments?



James Lyons-Weiler

Aug 24, 2018, 10:25:51 AM8/24/18
to Open Science Framework
I have to commend your field for actually taking this issue, and other issues, seriously.

Compared to many other fields of inquiry, the due methodological - and philosophical diligence is commendable.

If you have been engaged in this discourse, kudos.

I am looking for empirical examples of collections of a prior power values along with sample sizes, p-values, "Significant"/"Non-Significant" calls for a new index I call the Corroboration Index.

We try to incorporate researcher degrees of freedom and somehow try to maintain control over biases in meta-analyses, but from what I've seen other 
than reporting heterogeneity measures etc. we are still left without a solid sense of the what the overall results are telling us, in part because of the publication bias, and in part because of the mistrust of low-powered studies (and for some, mistrust of studies with power >0.8). 

The Corroboration index takes each study at face-value in terms of its knowledge claim, tempered by a priori power estimates.  The simplest expression of the index that are not power-naive is the total sum of the power of studies focus on a given Ho that Reject Ho / the sum of the power of studies on a given Ho that Fail to Reject Ho.  The two parameters that influence the CI are, of course, the number of positive and negative results published (subject to publication bias, of course) and the power of such studies (which of course has it own sources of variance and, if you like, biases).

The value, however, is in the ideal future, in which both positive and negative results are published, the method really helps balance the quality of the evidence. Let's say all studies are equally powered, and they are 50:50 in favor or Rejection:Do Not Reject.  CI = 1.0   If the negative results tend to lower in power, and the failure to reject is more likely due to low power, the CI will increase.  If the negative results, however, are amply powered, it will tend back toward 1:0..

Three versions can be envisioned; the power-naive ration of study counts; one that uses Sum of Power values for CIs and one that uses Average of Power values for CIa (to correct for publication bias). The latter two would have merits and different uses in the toolbox given outcome of tests for publication bias.

Attached is a spreadsheet with a random numbers examples implementing the Corroboration Index.

As I said, I'm looking for empirical examples from publish meta-analyses that collected a priori (or post-hoc) power analyses and of course thoughts and feedback.

James Lyons-Weiler
Corroboration Index.xlsx
Reply all
Reply to author
0 new messages