The replication movement in psychology has had many positive effects, such as the discussion of how to avoid p-hacking and the emphasis on increased transparency, including posting data, detailed methods sections, and the results of unpublished studies on publicly available web sites. These practices will undoubtedly improve our science.
But something is seriously out of whack. Despite its benefits, the replication movement has had a polarizing effect. Whereas most of the researchers involved in the replication movement have the best interests of the field at heart and are well-intentioned, some seem bent on disproving other researchers’ results by failing to replicate. Whereas some researchers have embraced the movement and taken part in it, others are deeply suspicious and fear that ill-intentioned replicators will target them, fail to replicate their findings, and damage their reputations.
Why are many people afraid? One reason, I believe, is that there has been more emphasis on false positives than false negatives. When an effect fails to replicate, the spotlight of suspicion shines on the original study and the authors who conducted it. “False Positive Alert” flashes like a neon sign as the buzz spreads in the Tweetosphere and Blogworld. But why should we assume that a failure to replicate is “truer” than the original study? Shouldn’t the spotlight shine as brightly on the replicators, with a close examination of their research practices, in case they have obtained a false negative?
There are many reasons why a false negative could occur, including these:
- Replications might be conducted by researchers who are inexperienced or lack expertise, either in general or in the particular area they are trying to replicate.
- As has been well documented, researchers are human and can act in ways that make them more likely to confirm a hypothesis, resulting in p-hacking. But replicators are human too, and if their hypothesis is that an effect will not replicate, they too can act in ways that increase the likelihood of obtaining that outcome—a practice we might call p-squashing. For example, it would be relatively easy to take an independent variable that had a significant effect in the laboratory, translate it into an on-line study that delivers the manipulation in a much weaker fashion, and then run hundreds of participants, resulting in a null effect. Adding such a study to a meta-analysis could cancel out positive findings from several smaller studies because of its very large sample size, resulting in meta p-squashing.
- As others have noted (e.g., Stroebe & Strack, 2013), a direct replication could fail because it was conducted in a different context or with a different population, and as a result did not manipulate the psychological construct in the same manner as did the original study.
Do I have evidence that many of the studies that have been done as part of the current replication movement have been plagued by the above problems? Well, not much, though I suggest that the evidence is equally weak that false positives are rampant. One might even argue that there is just as much evidence that we have a crisis of false negatives as we do a crisis of false positives.
This is important because both kinds of errors can have serious consequences. As many in the replication movement have argued, false positives can be costly to a field’s credibility and to subsequent researchers who spend valuable research time going down a blind alley. But false negatives can also be damaging, both to the reputation of the original researcher and the progression of science (see Fiedler, Kutzner, and Krueger, 2012, for an excellent discussion of this issue). Consequently, neither those who attempt replications nor the authors of original studies should stake out the moral high ground in this debate. We should all scrutinize replications with the same critical eye as we do original studies and not assume that a failure to duplicate a result means that the original finding was false. For example, if replications are submitted to a journal, they should undergo the same rigorous review process as any other submission.
There is another unintended effect of the replication movement, namely that it places too much emphasis on duplication and not enough on discovering new and interesting things about human behavior, which is, after all, why most of us got into the field in the first place. As noted by Jim Coan, the field has become preoccupied with prevention and error detection—negative psychology—at the expense of exploration and discovery. The biggest scientific advances are usually made by researchers who pursue unorthodox ideas, invent new methods, and take chances. Almost by definition, researchers who adopt this approach will produce findings that are less replicable than ones by researchers who conduct small extensions of established methodologies, at least at first, because the moderator variables and causal mechanisms of novel phenomena are not as well understood. I fear that in the current atmosphere, many researchers will gravitate to safe, easily replicable projects and away from novel, creative ones that may not be easily replicable at first but could lead to revolutionary advances.
For those interested in conducting replications, there might be a happy medium. For example, researchers all over the world have conducted replications of the same phenomenon as part of the “Many Labs” project. I suggest that we would learn more from this endeavor with a small twist: Ask all participating labs to add an interesting moderator variable of their choice to the design, with random assignment, in addition to performing a direct replication. This would nudge replicators into thinking deeply about the phenomenon they are trying to replicate and to make predictions about the underlying psychological processes, possibly leading to substantial advances in our understanding of the phenomenon under study—that is, to discovery as well as duplication.
In any polarized debate, common ground becomes obscured. It is thus worth remembering that all scientists agree on two things: We want our methods to be as sound as possible and we value novel, creative, groundbreaking findings. It would be unfortunate if the emphasis on one came at the expense of the other.
(Note: This post benefited greatly from comments by Jerry Clore, Dan Gilbert, and Brian Nosek—but by thanking them I do not mean to imply in the least that they agree with anything I have said.)