Brian ( et al)
I was also thinking how to treat this ambiguity where people use “replication” as a synonym for “successful replication”.
The problem with this formula is that the meta-analytic result will be subject to publication bias. It is also harder to calculate because a whole meta-analysis has to be done, including scouring for unpublished studies (a process never 100% effective). It would make more sense under a publication regime we do not have ... yet ...
The original formula has the advantage of being more straightforward, and functionally equivalent in a world where non-replications don’t get published (I guess reversed replications do, but they are rare, only Glaser & Banaji comes to mind... and in that case boundary finding is the more appropriate research response). Important given there’s a lot of wiggle room in defining direct, let alone conceptual replications.
It does make me think about defining different things we want to replicate:
Direct replication: of the whole experiment, IV and DV
Effect replication: of the IV’s effects on a DV, conceptualized in different ways
Conceptual replication: varying the IV and/or the DV
You know ( and maybe agree) that some publishing trends in the field have gone too far into finding clever or surprising effects, rather than establishing underlying theoretical principles and boundaries. (Yes, yes, we get it, making concepts available to the mind –via any of the senses - increases the likelihood of congruent behavior). I want to make clear how replication is valuable at each level rather than just focusing on direct replication.
Direct replication is most important to the *integrity of our science*. (Is this effect cherry-picked or fraudulent?)
Effect replication is most important to the *application of our science*. (Will those eyes increase honesty in a variety of domains?)
Conceptual replication is most important to the *theory behind our science* as well as the application (what do the eyes represent exactly, conceptually?)
That said, it is easier to do that meta-analysis for the first two than for the third, and also easier to agree on what counts as a replication due to the likelihood of direct citation of the parent study.
Dr Roger Giner-Sorolla
Reader in Social Psychology
School of Psychology
University of Kent
Canterbury, Kent CT2 7NP
United Kingdom
tel. +44 (0)1227 823085, leave out 0 if calling from abroad
From: bno...@gmail.com [mailto:bno...@gmail.com] On Behalf Of Brian Nosek
Sent: 16 February 2012 17:32
To: Mark Brandt; Roger Giner-Sorolla; Jeffrey Spies; Daniel Lakens
Subject: RV thread
So far, the five of us are ones that have weighed in on the RV manuscript - at least in the google doc itself.
I thought it might be useful to start a discussion thread that doesn't fill all the OSF folks inboxes with detail RV discussion (larger points could still be brought up there).
I am attaching an article that may be of some use, and below is the text of a comment that I just added to the paper that might deserve some rapid discussion rather than resolution through the doc comment thread:
--------------------------------------
I haven't gone through the rest of the manuscript yet, but I had a thought about the calculation of replication value yesterday. This builds on my earlier point that RV should be a complement to meta-analysis, not an alternative, and that just counting the number of replications does not consider the reliability of the demonstrations and outcomes of the replications.
So, how about something like:
RV = (times cited) / (p-value of the meta-analytic result - whatever number of replications there are)
[lower values indicating replication is more important, the minus sign is not intended to be a minus sign, just explanation of what is in the meta-analytic count]
The computation might be adjusted, but the key point is that the p-value of the meta-analytic result provides a simple means of incorporating all the relevant concepts (effect size, sample size, number/size of replications) into an index of how likely is it that this effect is due to chance?
There may be a place for rapidly adjusting the magnitude based on whether the p-value is based on only a single study or more, but that would probably need to be a steep slope (really only sharpening RV for single study demonstrations because the meta-analytic p already incorporates some info on this).
----
If an adjustment by the actual number of replications had added information to the meta-analytic p-value, it could be of the form:
2-((1/replications)^2)makingRV = [(times cited) / (p-value of meta-analysis of replications)] / [2-((1/replications)^2) ]The effect of the replications adjustment would be to halve the RV when the score is based on only one study and barely change the RV at all when the score is based on 6, 10, 20 studies.To see the shape of the adjustment google this: graph y = (1/x)^2This adjustment has some redundancy with the p-value. It's main value is perhaps sociological rather than statistical. That is, any result, no matter how strongly demonstrated (i.e., low p-value), it more trustworthy if it has been reproduced - particularly by independent researchers. So, the statistic would incorporate statistical confidence (the p-value portion), and replication confidence (the replications portion).
Second, what you are trying to do here is to weight the evidence for
the effect that has already been collected, and combine it with a
measure of its importance or impact. Basically, you want to maximize
the expected utility of the replication. There are many possible
action to take (i.e., experiments to replicate) and you want to pick
the best one to spend your limited resources on. As it happens, such
decision problems are solved by applying Bayesian stats. Perhaps you
can also do this with frequentist stats --ad-hoc solutions exists for
many things-- but I don't see why you would. So instead of pondering
the problem without the benefit of a formal framework, I suggest that
this problem is in fact analogous to other, superficially different
problems that also involve the maximization of expected utility. This
analysis will automatically drive one to the right insights by using
probabilities (not those of the p-value kind).
For instance, the utility-maximization idea suggests that what matters
is not just the prior support for a theory and its impact measured in
some way or other (i.e., if low prior support and high impact, then
high replication value). Instead, one needs to consider the utility of
a failed replication and the utility of a successful replication. For
a new finding, both utilities may be high (for instance because they
have both lead to a substantial increase in our knowledge). For an
older, more established finding, the utility of a successful
replication may be very low.
Anyway, I just wanted to say that the problem we have is one where we
have prior knowledge (i.e., earlier support for effects), many actions
(experiments to replicate), limited resources, and uncertain outcomes.
An optimal decision-making analysis would be very helpful.
Cheers,
E.J.
--
********************************************
WinBUGS workshop in Amsterdam: http://bayescourse.socsci.uva.nl
Eric-Jan Wagenmakers
Department of Psychological Methods, room 2.16
University of Amsterdam
Weesperplein 4
1018 XA Amsterdam
The Netherlands
Web: www.ejwagenmakers.com
Email: EJ.Wage...@gmail.com
Phone: (+31) 20 525 6420
“Man follows only phantoms.”
Pierre-Simon Laplace, last words
********************************************
I've really enjoyed following this discussion. One thought that keeps
creeping to mind as I follow along is in regard to the implication of making
RV a function of # of times cited. I agree that an important dimension of
RV is the interest in the effect, however; it seems that an RV statistic
based on # of citations requires that a finding be appropriately 'aged'
before there is substantial value in replicating it. This may return to the
discussion about the general goals of the replication value statistic, but
if the goal is to motivate others to replicate a finding, it seems that
there could be a good deal of value in replication prior to a lot of
citations taking place. Particularly, if we are concerned with Type I
errors, a replication value that is a function of # of citations can also be
viewed as a statistic highlighting the potential degree of negative impact
that a Type I error has had on the field. Thus, if a replication attempt
with high RV (using an RV statistic that incorporates # of citations) has
sufficient power to detect an effect and the replication fails, we are all
of a sudden in a position where there would have been considerably more
value in the replication had it occurred earlier (in terms of the time and
effort spent by other scientists who have attempted to build off of the
findings of the original study).
Granted, I don't know that I can think of a good solution to this. Maybe
for 'new' effects, an appropriate measure of impact would be the impact
factor of the publication source as opposed to the number of citations? At
this point I just wondered if others also felt that this could be a 'blind
spot' in the RV statistic as stated.
Russ
Hi Jamie,
How do you operationalize the difference between these two types of studies? And how could these differences be incorporated in the replication value? You are right that the RV is now treating all studies as equal (although the attempt to incorporate the ES does try to give more weight to more reliable findings). I agree it is important to clarify some studies should be regarded as exploratory – but instead of defining this a-priori, a high RV value might be considered an indication that a finding is of interest, but awaiting confirmatory studies that replicate the effect. Or could this same point be made in a better way by making the differences between exploratory and confirmatory studies more central in the formula?
Daniel
> -----Oorspronkelijk bericht-----
> Van: openscienc...@googlegroups.com
> [mailto:openscienc...@googlegroups.com] Namens Roger Giner-
> Sorolla
> Verzonden: zondag 4 maart 2012 1:04
> Aan: Open Science Framework
> Onderwerp: [OpenScienceFramework] Re: Replication Value project:
> separate thread FYI
>
http://www.frontiersin.org/computational_neuroscience/10.3389/fncom.2012.00008/full
Mike
Hi everyone,
I've updated the intro of the google doc file based on some suggestions - commenting on it might be more useful at a later moment. What is now needed is that we reach agreement about how exactly we will calculate the Replication Value. The impact (citations) and attempts (replications) parts are clear - we need to decide upon how to incorporate the reliability. After that is done, we can perhaps distribute some tasks (writing paragraphs, run simulations, etc) and try to get a first draft of the manuscript ready - the issue of replicability seems to be more timely than ever.
I have read up on my stats, and had to think that with reliability, we are mainly interested in the probability an observed finding can be expected to replicate. I hope you agree. This value is known as p-rep. Calculating the probability that a finding will replicate is a debated question ever since Psychological Science started asking researchers to use p-rep instead of p-values (for several articles by the people who matter in this debate, see http://psycnet.apa.org/journals/met/15/2/ ). At least intuitively, it makes a lot of sense to include to probability a finding will replicate in the Replication Value ;)
There are several suggested ways to calculate the probability a finding will replicate, so if this sounds interesting, we still have to choose a specific way. I think the proposal in the attached PDF file (Iverson, Wagenmakers, Lee (2010) A model averaing approach to replication), using Bayesian model averaging, seems like a good way to calculate a p-rep to include in the RV. I hope EJ Wagenmakers can tell us whether that indeed makes sense, and if this could be implemented in a way that is easy for the general scientific community, or whether another version (the p-rep by Killeen, even though it has its problems) might be better.
We can still try to use the SE (or relative SE), but I don't know how. If you are a fan of that suggestion, please detail how it should calculated (not only for the first, but also second replication). I now feel it will be heading to much into a meta-analysis suggestion (especially after the first replication), but might not have thought enough about it.
In addition to deciding upon the Reliability factor, we need to answer the following question:
We all agree that number of replications decreases the Replication Value. We want to incorporate a correction for the reliability of the finding initial finding. How do these 2 factors relate to each other? How much reliability can compensate for a single replication? The idea was that one big study is more reliable than 2 or 3 smaller studies, but statisticians do not agree, because random error in the study is always possible, so 3 small studies control better for such random error than 1 big study. Also, to be really reliable, studies need to be really big (n = 153.669, according to Hunter - 2001- The desperate need for replications), so the difference between 1000 or 20 will not make a huge difference - according to Hunter, a study with 1000 ppn just needs to be replicated less often than a study with 20 ppn. If we use p-red (a percentage from 0 to 1, we should use 1/p-rep) or RSE (also a percentage), how much should the RV increase with how much loss in reliability? I personally think more replications are more important than higher reliability. Playing around with the simulations Marco provided might be useful here.
Looking forward to your input!
Daniel
The number of citations is divided by the sum of the power of all replications, squared, plus one. The power of a study has a value between 0 and 1. The higher this value (thus the more high-powered replications are performed) the lower the replication value. Importantly, it does not matter whether the attempts were successful or not. The denominator is incremented by one to avoid undefined RV’s, and the square root is taken to reflect the fact that especially the first few replications provide important information about the robustness of an effect, whereas the value of each additional replication attempt is less than the previous replication (we could discuss alternatives, such as the natural log). The denominator is based on the power of each replication, instead of simply on the number of replications, because not all replications are equal. It is essential that studies have enough statistical power to reveal a hypothesized effect. It is still common for psychologists to perform hugely underpowered experiments. Replications should have sufficient statistical power to observe the predicted effect (with a typical minimal value of .80, but preferably, power should be as high as is feasible).
Note that we are not weighing each study based on the precision of the effect size (typically the inverse of the squared standard error, or the inverse variance weight), as is the custom in meta-analyses. The reason is that we are not interested in the size of the effect, or the precision with which this can be estimated. The goal of the replication value is to provide an indication of the value of a replication based on the number of replications with sufficient power. At the same time, both power as the inverse variance weight are to a large extent influenced by the sample size, and therefore just as how larger studies receive more weight in meta-analyses, so will larger studies receive more weight in determining the Replication Value.
Very interested in what you all think. It would be nice to get this idea of a RV into something we can start to use. If you think it makes sense, great - if not, any ideas how to improve it?
To clarify, because I think I did not explain it well before: The power for each replication (which obviously lies between 0 and 1) is summed over replications. So replication 1 has a power of .80, replication 2 has a power of .92, then the denominator will be (1,72 squared) + 1. This way the denominator increases with each replication, but not by 1, but with the power of the replication. This will on average give more weight to larger studies, while at the same time being conceptually more related to replications, and less to meta-analyses.
Hi,
--
You received this message because you are subscribed to the Google Groups "Open Science Framework" group.
To unsubscribe from this group and stop receiving emails from it, send an email to openscienceframe...@googlegroups.com.
For more options, visit https://groups.google.com/groups/opt_out.