Large-scale replication projects of the last years (e.g., ManyLabs I, II, and III, Reproducibility Project: Psychology) showed that the “replication crisis” in psychology is more than just hot air: According to recent estimates, ~60% of current psychological research is not replicableI will not go into details here about 'What counts as a replication?'. The 60% number certainly can be debated on many grounds, but the take-home-message is: It's devastating.. This spurred a lot of developments, such as the TOP guidelines, which define transparency and openness criteria for scientific publications.
The field is thinking about how we can ensure that we generate more actual knowledge and less false positives, or in the words of John Ioannidis: How to make more published research true.
In order to fathom potential consequences for our own department, our department’s administration unanimously decided to establish an Open Science Committee (OSC).
The committee’s mission and goals include:
- Monitor the international developments in the area of open science and communicate them to the department.
- Organize workshops that teach skills for open science (e.g., How do I write a good pre-registration? What practical steps are necessary for Open Data? How can I apply for the Open Science badges?, How to do an advanced power analysis, What are Registered Reports?).
- Develop concrete suggestions concerning tenure-track criteria, hiring criteria, PhD supervision and grading, teaching, curricula, etc.
- Channel the discussion concerning standards of research quality and transparency in the department. Even if we share the same scientific values, the implementations might differ between research areas. A medium-term goal of the committee is to explore in what way a department-wide consensus can be established concerning certain points of open science.
The OSC developed some first suggestions about appropriate actions that could be taken in response to the replication crisis at the level of our department. We focused on five topics:
- Supervision and grading of dissertations
- Voluntary public commitments to research transparency and quality standards (this also includes supervision of PhDs and coauthorships)
- Criteria for hiring decisions
- Criteria for tenure track decisions
- How to allocate the department’s money without setting incentives for p-hacking
Raising the bars naturally provokes backlashs. Therefore we emphasize three points right from the beginning:
- The described proposals are no â€śfinal programâ€ť, but a basis for discussion. We hope these suggestions will trigger a discussion within research units and the department as a whole. Since the proposal targets a variety of issues, of course they need to be discussed in the appropriate committees before any actions are taken.
- Different areas of research differ in many aspects, and the actions taken can differ betweens these areas. Despite the probably different modes of implementation, there can be a consensus regarding the overarching goal – for example, that studies with higher statistical power offer higher gains in knowledge (ceteris paribus), and that research with larger gains in knowledge should be supported.
- There can be justified exceptions from every guideline. For example, some data cannot sufficiently be anonymized, in which case Open Data is not an option. The suggestions described here should not be interpreted as chains to the freedom of research, but rather as a statement about which values we as a research community represent and actively strive for.
Two chairs are currently developing a voluntary commitment to research transparency and quality standards. These might serve as a blue-print or at least as food for thought for other research units. When finished, these commitments will be made public on the department’s website (and also on this blog). Furthermore, we will collect our suggestions, voluntary commitments, milestones,Â etc. on a public OSF project.
Do you have an Open Science Committee or a similar initiative at your university? We would love to bundle our efforts with other initiatives, share experiences, material, etc. Contact us!
— Felix SchĂ¶nbrodt, Moritz Heene, Michael Zehetleitner, Markus Maier
Stay tuned – soon we will present a first major success of our committee!
(Follow me on Twitter for more updates on #openscience and our Open Science Committee: @nicebread303)
There are at least three traditions in statistics which work with a kind of likelihood ratios (LRs): the “Bayes factor camp”, the “AIC camp”, and the “likehood camp”. In my experience, unfortunately most people do not have an intuitive understanding of LRs. When I give talks about Bayes factors, the most predictable question is “And how much is a BF of 3.4? Is that something I can put confidence in?”.
Recently, I tried to approach the topic from an experiental perspective (“What does a Bayes factor feel like?“) by letting people draw balls from an urn and monitor the Bayes factor for an equal distribution of colors. Now I realized that I re-discovered an approach that Richard Royall did in his 1997 book “Statistical Evidence: A Likelihood Paradigm”: He also derived labels for likelihood ratios by looking at simple experiments, including ball draws.
But beyond this approach of getting an experiental access to LRs, all traditions mentioned above proposed in some way labels or “grades” of evidence.
These are summarized in my cheat sheet below.
(There’s also a PDF of the cheat sheet).
There’s considerable consensus about what counts as “strong evidence” (But this is not necessarily “independent replication” – maybe they just copied each other).
But there’s also the position that we do not need labels at all – the numbers simply speak for themselves! For an elaboration of that position, see Richard Morey’s blog post. Note that Kass & Raftery (1995) are often cited for their grades in the cheat sheet, but according to Richard Morey rather belong to the “need no labels” camp (see here and here). On the other hand, EJ Wagenmakers mentions that they use their guidelines themselves for interpretation and asks “when you propose labels and use them, how are you in the no-labels camp?”. Well, decide yourself (or ask Kass and Raftery personally), whether they belong into the “labels” or “no-labels” camp.
Now that I have some experience with LRs, I am inclined to follow the “no labels needed” position. But whenever I explain Bayes factors to people who are unacquainted with them, I really long for a descriptive label. I think the labels are short-cuts, which relieve you from the burden to explain how to interpret and judge an LR (You can decide yourself whether that is a good or a bad property of the labels).
To summarize, as LRs are not self-explanatory to the typical audience, I think you either need a label (which is self-explanatory, but probably too simplified and not sufficiently context-dependent), or you should give an introduction on how to interpret and judge these numbers correctly.
Literature on grades of evidence:
Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. Springer Science & Business Media.
Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65, 23â€“35. doi:10.1007/s00265-010-1029-6
Symonds, M. R. E., & Moussalli, A. (2011). A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaikeâ€™s information criterion. Behavioral Ecology and Sociobiology, 65, 13â€“21. doi:10.1007/s00265-010-1037-6
Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), Bayesian Statistics 2 (pp. 249â€“270). Elsevier.
Jeffreys, H. (1961). The theory of probability. Oxford University Press.
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.
Royall, R. M. (1997). Statistical evidence: A likelihood paradigm. London: Chapman & Hall.
Royall, R. M. (2000). On the probability of observing misleading statistical evidence. Journal of the American Statistical Association, 95, 760â€“768. doi:10.2307/2669456
“We need no labels camp”
Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773â€“795.
Morey, R. D. (2015).Â On verbal categories for the interpretation of Bayes factors (Blog post). http://bayesfactor.blogspot.de/2015/01/on-verbal-categories-for-interpretation.html
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225â€“237.