January 21, 2014

**[Update Oct 2014: Due to some changes to the Bayes factor calculator webpage, and as I understand BFs much better now, this post has been updated …]**

I started to familiarize myself with Bayesian statistics. In this post I’ll show some insights I had about *Bayes factors* (BF).

Bayes factors provide a numerical value that quantifies how well a hypothesis predicts the empirical data relative to a competing hypothesis. For example, if the BF is 4, this indicates: “This empirical data is 4 times more probable if H₁ were true than if H₀ were true.” . Hence, evidence points towards H₁. A BF of 1 means that data are equally likely to be occured under both hypotheses.

More formally, the BF can be written as:

where D is the data. Hence, the BF is a ratio of probabilities, and is related to larger class of likelihood-ratio test.

What researchers usually are interested in is not p(Data | Hypothesis), but rather p(Hypothesis | Data). Using Bayes’ theorem, the former can be transformed into the latter by assuming prior probabilities for the hypotheses. The BF then tells one how to update one’s prior probabilities after having seen the data, using this formula (Berger, 2006):

Given a BF of 1, one does not have to update his or her priors. If one holds, for example, equal priors (p(H1) = p(H0) = .5), these probabilities do not change after having seen the data of the original study.

The best detailed introduction of BFs I know of can be be found in Richard Morey’s blog posts [1] [2][3]. Also helpful is the ever-growing tutorial page for the BayesFactor package. (For other introductions to BFs, see Wikipedia, Bayesian-Inference, the classic paper by Kass and Raftery, 1995, or Berger, 2006).

Although many authors agree about the many theoretical advantages of BFs, until recently it was complicated and unclear how to compute a BF even for the simple standard designs (Rouder, Morey, Speckman, & Province, 2012). Fortunately, over the last years *default Bayes factors* for several standard designs have been developed (Rouder et al., 2012; Rouder, Speckman, Sun, Morey, & Iverson, 2009; Morey & Rouder, 2011). For example, for a two-sample *t* test, a BF can be derived simply by plugging the *t* value and the sample sizes into a formula. The BF is easy to compute by the R package BayesFactor (Morey & Rouder, 2013), or by online calculators [1][2].

When I started to familiarize myself with BFs, I was sometimes confused, as the same number seemed to mean different things in different publications. And indeed, **four types of Bayes factors** can be distinguished. “Under the hood”, all four types are identical, but you have to be aware which type has been employed in the specific case.

The first distinction is, whether the BF indicates “ over ” (=), or “ over ” (=). A of 2 means “Data is 2 times more likely to be occured under than under “, while the same situation would be a of 0.5 (i.e., the reciprocal value 1/2). Intuitively, I prefer larger values to be “better”, and as I usually would like to have evidence for instead of , I usually prefer the . However, if your goal is to show evidence *for* the H0, then is easier to communicate (compare: “Data occured 0.1 more likely under the alternative” vs. “Data show 10 times more evidence for the null than for the alternative”).

The second distinction is, whether one reports the raw BF, or the natural logarithm of the BF (The log(BF) has also been called “*weight of evidence*“; Good, 1985). The logarithm has the advantage that the scale above 1 (evidence for ) is identical to the scale below 1 (evidence for ). In the previous example, a of 2 is equivalent to a of 0.5. Taking the log of both values leads to = 0.69 and = -0.69: Same value, reversed sign. This makes the log(BF) ideal for visualization, as the scale is linear in both directions. Following graphic shows the relationship between raw/log BF:

As you can see in the Table of Figure 1, different authors use different flavors. This often makes sense, as we sometimes want to communicate evidence for the H1, and sometimes for the H0. However, for the uninitiated it can be sometimes confusing.

Usually, tables in publication report the raw BF (raw- or raw+). Plots, in contrast, typically use the log scale, for example:

Figure 2 shows conversion paths of the different BF flavors:

The user interface functions of the BayesFactor package always print the raw . Internally, however, the BF is stored as log().

Hence, you have to be careful when you directly use the backend utility functions, such as `ttest.tstat`

. These functions return the log(). As the conversion table shows, you have to `exp()`

that number to get the raw BF. Check the documentation of the functions if you are unsure which flavor is returned.

Related posts: Exploring the robustness of Bayes Factors: A convenient plotting function

Berger, J. O. (2006). Bayes factors. In S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic, & N. L. Johnson (Eds.), *Encyclopedia of Statistical Sciences, vol. 1 (2nd ed.)* (pp. 378–386). Hoboken, NJ: Wiley.

Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), *Bayesian Statistics 2* (pp. 249–270). Elsevier.

Morey, R. D. & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16(4), 406–419. PMID: 21787084. doi:10.1037/a0024377

Morey, R. D. & Rouder, J. N. (2013). {BAYESFACTOR}: computation of bayes factors for common designs. R package version 0.9.4. Retrieved from http://CRAN.R- project.org/package=BayesFactor

Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5), 356–374. doi:10.1016/j.jmp.2012.08.001

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.

Excellent post! Great job!

Dear Felix

Interesting post. I have a completely unrelated question: Which software did you use to produce the figures 1 & 2 in your post. I am trying to identify a suitable open-source software to produce figures like yours.

Appreciate your help very much.

regards

K

I do quite everything with open source/ free software – except these plots 😉

I made them with OmniGraffle, which is only available for Mac OS. It’s not cheap (~100$), but by far the best I have seen for this task …

Inkscape is quite nice for vector graphics:

http://inkscape.org/en/

Thanks, this is a clear and helpful explanation..

[…] the free lunch in inference” Jeff Rouder and colleagues argue that in the case of the default Bayes factor for t tests the choice of the H1 prior distribution does not make a strong difference (see Figure […]

[…] likely than the H0 to speak of evidence for an effect). For an introduction to Bayes factors see here, here, or […]

One of the biggest problems with Bayes factors is that they are fairly meaningless, having all sorts of different values in different contexts. The numbers don’t indicate different amounts of comparative evidence in the different cases. Second, their values depend on what is chosen as the alternative. Are these point against point hypotheses? They are not exhaustive. If one tries to use a bayesian catchall factor (everything else including hypotheses not yet thought of), you get into other troubles.

All the exs. I see of BFs vastly exaggerate the evidence against the null. You just pick an alternative that makes the data maximally likely. Most seriously of all, there is no error control. How the hypotheses were formulated, selection effects, etc. do not alter the Bayes ratio. Unless you go on to compute P(LR > observed;H) for various hypotheses. Then of course one is doing error statistics.

I should note that non-bayesians do not compute P(D|Ho)—this is just a howler often pinned on them. They compute things like Pr(d(X) < d(x);Ho). In a one-sided positive test of a mean m mo, say, a high value for this probability, .99,say, warrants m > mo (severity principle).

by contrast, what does it mean to have a Bayesian ratio of 10, 20, 500—?

[…] for a hypothesis, compared to an alternative hypothesis (for introductions to Bayes factors, see here, here or […]

Nice post. Thanks for the callout. I find people have a hard time with Pr(Data|Model). I am starting to ask people if they prefer models that make predictions about data, say as probability statements where data lie, before seeing the data. Most do. These predictions are Pr(Data|Model). The Bayes factors is then the relative predictive accuracy of one model over another for the observed data. Best, Jeff

[…] A short taxonomy of Bayes factors […]

[…] This ratio means that the data are 4.19 times as likely under Mrs. Bristol’s hypothesis as under Mr. Fisher’s hypothesis. It does not matter how you order the likelihoods in the fraction, the meaning remains constant (see this blog post). […]