Felix Schönbrodt

PD Dr. Dipl.-Psych.

A short taxonomy of Bayes factors

[Update Oct 2014: Due to some changes to the Bayes factor calculator webpage, and as I understand BFs much better now, this post has been updated …]

I started to familiarize myself with Bayesian statistics. In this post I’ll show some insights I had about Bayes factors (BF).

What are Bayes factors?

Bayes factors provide a numerical value that quantifies how well a hypothesis predicts the empirical data relative to a competing hypothesis. For example, if the BF is 4, this indicates: “This empirical data is 4 times more probable if H₁ were true than if H₀ were true.”
. Hence, evidence points towards H₁. A BF of 1 means that data are equally likely to be occured under both hypotheses.

More formally, the BF can be written as:

BF_{10} = \frac{p(D|H_1)}{p(D|H_0)}

where D is the data. Hence, the BF is a ratio of probabilities, and is related to larger class of likelihood-ratio test.

What researchers usually are interested in is not p(Data | Hypothesis), but rather p(Hypothesis | Data). Using Bayes’ theorem, the former can be transformed into the latter by assuming prior probabilities for the hypotheses. The BF then tells one how to update one’s prior probabilities after having seen the data, using this formula (Berger, 2006):

 p(H_1 | D) = \frac{BF}{BF + [(1-p(H1)]/p(H1)}

Given a BF of 1, one does not have to update his or her priors. If one holds, for example, equal priors (p(H1) = p(H0) = .5), these probabilities do not change after having seen the data of the original study.

The best detailed introduction of BFs I know of can be be found in Richard Morey’s blog posts [1] [2][3]. Also helpful is the ever-growing tutorial page for the BayesFactor package. (For other introductions to BFs, see Wikipedia, Bayesian-Inference, the classic paper by Kass and Raftery, 1995, or Berger, 2006).

Although many authors agree about the many theoretical advantages of BFs, until recently it was complicated and unclear how to compute a BF even for the simple standard designs (Rouder, Morey, Speckman, & Province, 2012). Fortunately, over the last years default Bayes factors for several standard designs have been developed (Rouder et al., 2012; Rouder, Speckman, Sun, Morey, & Iverson, 2009; Morey & Rouder, 2011). For example, for a two-sample t test, a BF can be derived simply by plugging the t value and the sample sizes into a formula. The BF is easy to compute by the R package BayesFactor (Morey & Rouder, 2013), or by online calculators [1][2].

Flavors of Bayes factors

When I started to familiarize myself with BFs, I was sometimes confused, as the same number seemed to mean different things in different publications. And indeed, four types of Bayes factors can be distinguished. “Under the hood”, all four types are identical, but you have to be aware which type has been employed in the specific case.

The first distinction is, whether the BF indicates “H_0 over H_1” (=BF_{01}), or “H_1 over H_0” (=BF_{10}). A BF_{01} of 2 means “Data is 2 times more likely to be occured under H_0 than under H_1“, while the same situation would be a BF_{10} of 0.5 (i.e., the reciprocal value 1/2). Intuitively, I prefer larger values to be “better”, and as I usually would like to have evidence for H_1 instead of H_0, I usually prefer the BF_{10}. However, if your goal is to show evidence for the H0, then BF_{01} is easier to communicate (compare: “Data occured 0.1 more likely under the alternative” vs. “Data show 10 times more evidence for the null than for the alternative”).

The second distinction is, whether one reports the raw BF, or the natural logarithm of the BF (The log(BF) has also been called “weight of evidence“; Good, 1985). The logarithm has the advantage that the scale above 1 (evidence for H_1) is identical to the scale below 1 (evidence for H_0). In the previous example, a BF_{10} of 2 is equivalent to a BF_{01} of 0.5. Taking the log of both values leads to log(BF_{10}) = 0.69 and log(BF_{01}) = -0.69: Same value, reversed sign. This makes the log(BF) ideal for visualization, as the scale is linear in both directions. Following graphic shows the relationship between raw/log BF:

Bayesfactor_Overview1

Figure 1

 

As you can see in the Table of Figure 1, different authors use different flavors. This often makes sense, as we sometimes want to communicate evidence for the H1, and sometimes for the H0. However, for the uninitiated it can be sometimes confusing.

Usually, tables in publication report the raw BF (raw- or raw+). Plots, in contrast, typically use the log scale, for example:

Bildschirmfoto 2014-10-15 um 10.48.25

 

Figure 2 shows conversion paths of the different BF flavors:

Bayesfactor_Overview2

 

 

The user interface functions of the BayesFactor package always print the raw BF_{10}. Internally, however, the BF is stored as log(BF_{10}).

Hence, you have to be careful when you directly use the backend utility functions, such as ttest.tstat. These functions return the log(BF_{10}). As the conversion table shows, you have to exp() that number to get the raw BF. Check the documentation of the functions if you are unsure which flavor is returned.

Related posts: Exploring the robustness of Bayes Factors: A convenient plotting function

References

Berger, J. O. (2006). Bayes factors. In S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic, & N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 1 (2nd ed.) (pp. 378–386). Hoboken, NJ: Wiley.
Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), Bayesian Statistics 2 (pp. 249–270). Elsevier.

Morey, R. D. & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16(4), 406–419. PMID: 21787084. doi:10.1037/a0024377

Morey, R. D. & Rouder, J. N. (2013). {BAYESFACTOR}: computation of bayes factors for common designs. R package version 0.9.4. Retrieved from http://CRAN.R- project.org/package=BayesFactor

Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5), 356–374. doi:10.1016/j.jmp.2012.08.001

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.

Comments (8) | Trackback

8 Responses to “A short taxonomy of Bayes factors”

  1. HG says:

    Excellent post! Great job!

  2. Kishore Rathi says:

    Dear Felix

    Interesting post. I have a completely unrelated question: Which software did you use to produce the figures 1 & 2 in your post. I am trying to identify a suitable open-source software to produce figures like yours.

    Appreciate your help very much.

    regards

    K

  3. CL says:

    Thanks, this is a clear and helpful explanation..

  4. […] the free lunch in inference” Jeff Rouder and colleagues argue that in the case of the default Bayes factor for t tests the choice of the H1 prior distribution does not make a strong difference (see Figure […]

  5. […] likely than the H0 to speak of evidence for an effect). For an introduction to Bayes factors see here, here, or […]

  6. Mayo says:

    One of the biggest problems with Bayes factors is that they are fairly meaningless, having all sorts of different values in different contexts. The numbers don’t indicate different amounts of comparative evidence in the different cases. Second, their values depend on what is chosen as the alternative. Are these point against point hypotheses? They are not exhaustive. If one tries to use a bayesian catchall factor (everything else including hypotheses not yet thought of), you get into other troubles.
    All the exs. I see of BFs vastly exaggerate the evidence against the null. You just pick an alternative that makes the data maximally likely. Most seriously of all, there is no error control. How the hypotheses were formulated, selection effects, etc. do not alter the Bayes ratio. Unless you go on to compute P(LR > observed;H) for various hypotheses. Then of course one is doing error statistics.
    I should note that non-bayesians do not compute P(D|Ho)—this is just a howler often pinned on them. They compute things like Pr(d(X) < d(x);Ho). In a one-sided positive test of a mean m mo, say, a high value for this probability, .99,say, warrants m > mo (severity principle).
    by contrast, what does it mean to have a Bayesian ratio of 10, 20, 500—?

Leave a Reply