April 17, 2015

There are at least three traditions in statistics which work with a kind of likelihood ratios (LRs): the “Bayes factor camp”, the “AIC camp”, and the “likehood camp”. In my experience, unfortunately most people do *not* have an intuitive understanding of LRs. When I give talks about Bayes factors, the most predictable question is “And how much is a BF of 3.4? Is that something I can put confidence in?”.

Recently, I tried to approach the topic from an experiental perspective (“What does a Bayes factor feel like?“) by letting people draw balls from an urn and monitor the Bayes factor for an equal distribution of colors. Now I realized that I re-discovered an approach that Richard Royall did in his 1997 book “Statistical Evidence: A Likelihood Paradigm”: He also derived labels for likelihood ratios by looking at simple experiments, including ball draws.

But beyond this approach of getting an experiental access to LRs, all traditions mentioned above proposed in some way** labels or “grades” of evidence**.

These are summarized in my cheat sheet below.

(There’s also a PDF of the cheat sheet).

There’s considerable consensus about what counts as “strong evidence” (But this is not necessarily “independent replication” – maybe they just copied each other).

But there’s also the position that we **do not need labels at all** – the numbers simply speak for themselves! For an elaboration of that position, see Richard Morey’s blog post. Note that Kass & Raftery (1995) are often cited for their grades in the cheat sheet, but according to Richard Morey rather belong to the “need no labels” camp (see here and here). On the other hand, EJ Wagenmakers mentions that they use their guidelines themselves for interpretation and asks “when you propose labels and use them, how are you in the no-labels camp?”. Well, decide yourself (or ask Kass and Raftery personally), whether they belong into the “labels” or “no-labels” camp.

Now that I have some experience with LRs, I am inclined to follow the “no labels needed” position. But whenever I *explain* Bayes factors to people who are unacquainted with them, I really long for a descriptive label. I think the labels are short-cuts, which relieve you from the burden to explain how to interpret and judge an LR (You can decide yourself whether that is a good or a bad property of the labels).

To summarize, as LRs are not self-explanatory to the typical audience, I think you either need a label (which is self-explanatory, but probably too simplified and not sufficiently context-dependent), or you should give an introduction on how to interpret and judge these numbers correctly.

Burnham, K. P., & Anderson, D. R. (2002). *Model selection and multimodel inference: A practical information-theoretic approach*. Springer Science & Business Media.

Burnham, K. P., Anderson, D. R., & Huyvaert, K. P. (2011). AIC model selection and multimodel inference in behavioral ecology: some background, observations, and comparisons. *Behavioral Ecology and Sociobiology*, *65*, 23–35. doi:10.1007/s00265-010-1029-6

Symonds, M. R. E., & Moussalli, A. (2011). A brief guide to model selection, multimodel inference and model averaging in behavioural ecology using Akaike’s information criterion. *Behavioral Ecology and Sociobiology*, *65*, 13–21. doi:10.1007/s00265-010-1037-6

Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), *Bayesian Statistics 2* (pp. 249–270). Elsevier.

Jeffreys, H. (1961). *The theory of probability*. Oxford University Press.

Lee, M. D., & Wagenmakers, E.-J. (2013). *Bayesian cognitive modeling: A practical course*. Cambridge University Press.

Royall, R. M. (1997). *Statistical evidence: A likelihood paradigm*. London: Chapman & Hall.

Royall, R. M. (2000). On the probability of observing misleading statistical evidence. *Journal of the American Statistical Association*, *95*, 760–768. doi:10.2307/2669456

Kass, R. E., & Raftery, A. E. (1995). Bayes factors. Journal of the American Statistical Association, 90, 773–795.

Morey, R. D. (2015). *On verbal categories for the interpretation of Bayes factors (Blog post).* http://bayesfactor.blogspot.de/2015/01/on-verbal-categories-for-interpretation.html

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. *Psychonomic Bulletin & Review*, *16*, 225–237.

[…] Grades of evidence – A cheat sheet There are at least three traditions in statistics which work with a kind of likelihood ratios (LRs): the ‘Bayes factor camp’, the ‘AIC camp’, and the ‘likehood camp’. In my experience, unfortunately most people do not have an intuitive understanding of LRs. When I give talks about Bayes factors, the most predictable question is ‘And how much is a BF of 3.4? Is that something I can put confidence in?’. […]

[…] diagram from Felix Schobrodt’s blog is useful because it shows how we can quantify evidence on a continuum. I am arguing that in the […]