These days psychology really is exciting, and I do not mean the Förster case …
In May 2014 a special issue full of replication attempts has been released – all open access, all raw data released! This is great work, powered by the open science framework and from my point of view a major leap forward.
One of the replication attempts, about wether “Cleanliness Influence Moral Judgments” generated a lot of heat in the social media. This culminated in a blog post by one of the replicators, Brent Donnellan, an independent analysis of the data by Chris Fraley, and finally a long post by Simone Schnall who is the original first author of the effect, which generated a lot of comments.
Here’s my personal summary, conclusions, and insights I gained from the debate (Much of this has been stated by several other commenters, so this is more the wisdom of the crowd than my own insights).
As long as we stick to the p < .05 ritual, 1 in 20 studies will produce false positive results if there is no effect in the population. Depending on the specific statistical test, the degree of violation of the assumptions of this test, and the amount of QRPs you apply, the actual Type I error rate can be both lower and higher than the nominal 5% (in practice, I’d bet on “higher”).
We know how to fix this – e.g., Bayesian statistics (Wagenmakers, Wetzels, Borsboom, & van der Maas, 2011), “revised standards” (i.e., use p < .005 as magic threshold; Johnson, 2013), or focus on accuracy in parameter estimation instead of NHST (Maxwell, Kelley, & Rausch, 2008; Schönbrodt & Perugini, 2013).
A single study is hardly ever conclusive. Hence, a way to deal with uncertainty is to meta-analyze and to combine evidence. Let’s make a collaborative effort to increase knowledge, not (only) personal fame.
This has been emphasized several times, and, as far as I read the papers of the special issue, no replicator made this implication. In contrast, the discussions were generally worded very cautious. Look in the original literature of the special issue, and you will hardly (or not at all) find evidence for “replication bullying”.
The nasty false-positive monster is lurking around for everybody of us. And when, someday, one of my studies cannot be replicated, then, of course I’ll be pissed off at first, but then …
It does *not* imply that I am a bad researcher.
It does *not* imply that the replicator is a “bully”.
But it means that we increased knowledge a little bit.
Or, as Dale Barr (@dalejbarr) tweeted: “If you publish a study demonstrating an effect, you don’t OWN that effect.”
As several others argued, it usually is very helpful, sensible, and fruitful to include the original authors in the replication effort. In an ideal world we make a collaborative effort to increase knowledge, and a very good example for a “best practice adversarial collaboration”, see the paper and procedure by Dora Matzke and colleagues (2013). Here, proponents and skeptics of an effect worked, together with an impartial referee, on a confirmatory study.
To summarize, involvement of original authors is nice and often fruitful, but definitely not necessary.
Remember the old times where “debate” was carried out with a time-lag of at least 6 months (until your comment was printed in the journal, if you were lucky), and usually didn’t happen at all?
I *love* reading and participating the active debate. This feels so much more like science than what we used to do. What happens here is not a shitstorm – it’s a debate! And the majority of comments really is directed at the issue and has a calm and constructive tone.
So, use the new media for post-publication-review, #HIBAR (Had I Been A Reviewer), and real exchange. The rules of the game seem to change, slowly.