Grades of Evidence Cheat Sheet

Anectodal? Strong? Not worth mentioning? Beyond any doubt?

Author

Felix Schönbrodt

Published

June 11, 2026

There are at least three traditions in statistics which work with a kind of likelihood ratios (LRs): the “Bayes factor camp”, the “AIC camp”, and the “likehood camp”. In my experience, most people do not have an intuitive understanding of LRs. When I give talks about Bayes factors, the most predictable question is “And how much is a BF of 3.4? Is that something I can put confidence in?”.

I tried to approach the topic from an experiental perspective (“What does a Bayes factor feel like?”) by letting people draw balls from an urn and monitor the Bayes factor for an equal distribution of colors. Later I realized that I re-discovered an approach that Richard Royall did in his 1997 book “Statistical Evidence: A Likelihood Paradigm”: He also derived labels for likelihood ratios by looking at simple experiments, including ball draws.

But beyond this approach of getting an experiental access to LRs, all traditions mentioned above proposed in some way labels or “grades” of evidence. These are summarized in this cheat sheet:

(Download as PDF)

Note that the apparent consensus is not necessarily “independent replication” – maybe they just copied each other.

But there’s also the position that we do not need labels at all – the numbers simply speak for themselves! For an elaboration of that position, see Richard Morey’s blog post. Note that Kass & Raftery (1995) are often cited for their labels in the cheat sheet, but according to Richard Morey rather belong to the “need no labels” camp. On the other hand, EJ Wagenmakers mentions that they use their guidelines themselves for interpretation and asks “when you propose labels and use them, how are you in the no-labels camp?”. Well, decide yourself (or ask Kass and Raftery personally), whether they belong into the “labels” or “no-labels” camp.

Now that I have more experience with LRs, I am inclined to follow the “no labels needed” position. But whenever I explain Bayes factors to people who are unacquainted with them, descriptive labels really are helpful. Pragmatically, the labels are short-cuts which relieve you from the burden to explain how to interpret and judge an LR (You can decide yourself whether that is a good or a bad property of the labels).

To summarize, as numerical LRs are not self-explanatory to the typical audience, I think you either need a label (which is self-explanatory, but probably too simplified and not sufficiently context-dependent), or you should give an introduction on how to interpret and judge these numbers correctly.

Literature on grades of evidence:

“AIC camp”

Burnham, K. P., & Anderson, D. R. (2002). Model selection and multimodel inference: A practical information-theoretic approach. Springer Science & Business Media.
Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. Behavioral Ecology and Sociobiology, 65, 23–35. doi:10.1007/s00265-010-1029-6
Symonds, M. R. E., & Moussalli, A. (2011). A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. Behavioral Ecology and Sociobiology, 65, 13–21. doi:10.1007/s00265-010-1037-6

“Bayesian camp”

Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), Bayesian Statistics 2 (pp. 249–270).
Jeffreys, H. (1961). The theory of probability. Oxford University Press.
Lee, M. D., & Wagenmakers, E.-J. (2013). Bayesian cognitive modeling: A practical course. Cambridge University Press.

“Likelihood camp”

Royall, R. M. (1997). Statistical evidence: A likelihood paradigm. London: Chapman & Hall.
Royall, R. M. (2000). On the probability of observing misleading statistical evidence. Journal of the American Statistical Association, 95, 760–768. doi:10.2307/2669456

“We need no labels camp”

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.
Morey, R. D. (2015). On verbal categories for the interpretation of Bayes factors (Blog post). http://bayesfactor.blogspot.de/2015/01/on-verbal-categories-for-interpretation.html
Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237.