Interactive exploration of a prior’s impact

The probably most frequent criticism of Bayesian statistics sounds something like “It’s all subjective – with the ‘right’ prior, you can get any result you want.”.

In order to approach this criticism it has been suggested to do a sensitivity analysis (or robustness analysis), that demonstrates how the choice of priors affects the conclusions drawn from the Bayesian analysis. Ideally, it can be shown that for virtually any reasonable prior the conclusions remain the same. In this case, the critique “it’s all in the prior” can be refuted on empirical grounds.

In their recent paper “The p < .05 rule and the hidden costs of the free lunch in inference” Jeff Rouder and colleagues argue that in the case of the default Bayes factor for t tests the choice of the H1 prior distribution does not make a strong difference (see Figure 6, right panel). They come to the conclusion that “Prior scale does matter, and may change the Bayes factor by a factor of 2 or so, but it does not change the order of magnitude.” (p. 24).

The default Bayes factor for t tests (Rouder, Speckman, Sun, Morey, Iverson, 2009) assumes that effect sizes (expressed as Cohen’s d) are distributed as a Cauchy distribution (this is the prior distribution for H1). The spread of the Cauchy distribution can be changed with the scale parameter r. Depending on the specific research area, one can use a wider (large r‘s, e.g. r =1.5) or a thinner (small r’s, e.g. r = 0.5) Cauchy distribution. This corresponds to the prior belief that typically larger or smaller effect sizes can be expected.

For the two-sample t test, the BayesFactor package for R suggest three defaults for the scale parameter:

  • “medium” (r = sqrt(2)/2 = 0.71),
  • “wide” (r = 1), and
  • “ultra-wide” (r = sqrt(2) = 1.41).

Here’s a display for these distributions:

 

For a given effect size: How does the choice of the prior distribution change the resulting Bayes factor?

The following shiny app demonstrates how the choice of the prior influences the Bayes factor for a given effect size and sample size. Try moving the sliders! You can also provide arbitrary values for r (as comma-separated values; r must be > 0; reasonable ranges are between 0.2 and 2).

For a robustness analysis simply compare the lines at each vertical cut. An important line is the solid blue line at log(1), which indicates the same support for H1 and H0. All values above that line are in favor of the H1, all values below that line are in favor of H0.

 

 

As you will see, in most situations the Bayes factors for all r‘s are either above log(1), or below log(1). That means, regardless of the choice of the prior you will come to the same conclusion. There are very few cases where a data line for one r is below log(1) and the other is above log(1).  In this case, different r‘s would come to different conclusions. But in these ambiguous situations the evidence for H1 or for H0 is always in the “anectodal” region, which is a very weak evidence. With the default r’s, the ratio of the resulting Bayes factors is indeed maximally “a factor of 2 or so”.

To summarize, within a reasonable range of prior distributions it is not possible that one prior generates strong evidence for H1, while some other prior generates strong evidence for H0. In that sense, the conclusions drawn from a default Bayes factor are robust to the choice of (reasonable) priors.

References

Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E. J. (submitted). The p < .05 rule and the hidden costs of the free lunch in inference. Retrieved from http://pcl.missouri.edu/biblio/author/29

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16, 225–237.

Related posts

Comments (3) | Trackback

A short taxonomy of Bayes factors

[Update Oct 2014: Due to some changes to the Bayes factor calculator webpage, and as I understand BFs much better now, this post has been updated …]

I started to familiarize myself with Bayesian statistics. In this post I’ll show some insights I had about Bayes factors (BF).

What are Bayes factors?

Bayes factors provide a numerical value that quantifies how well a hypothesis predicts the empirical data relative to a competing hypothesis. For example, if the BF is 4, this indicates: “This empirical data is 4 times more probable if H₁ were true than if H₀ were true.”
. Hence, evidence points towards H₁. A BF of 1 means that data are equally likely to be occured under both hypotheses.

More formally, the BF can be written as:

BF_{10} = \frac{p(D|H_1)}{p(D|H_0)}

where D is the data. Hence, the BF is a ratio of probabilities, and is related to larger class of likelihood-ratio test.

What researchers usually are interested in is not p(Data | Hypothesis), but rather p(Hypothesis | Data). Using Bayes’ theorem, the former can be transformed into the latter by assuming prior probabilities for the hypotheses. The BF then tells one how to update one’s prior probabilities after having seen the data, using this formula (Berger, 2006):

 p(H_1 | D) = \frac{BF}{BF + [(1-p(H1)]/p(H1)}

Given a BF of 1, one does not have to update his or her priors. If one holds, for example, equal priors (p(H1) = p(H0) = .5), these probabilities do not change after having seen the data of the original study.

The best detailed introduction of BFs I know of can be be found in Richard Morey’s blog posts [1] [2][3]. Also helpful is the ever-growing tutorial page for the BayesFactor package. (For other introductions to BFs, see Wikipedia, Bayesian-Inference, the classic paper by Kass and Raftery, 1995, or Berger, 2006).

Although many authors agree about the many theoretical advantages of BFs, until recently it was complicated and unclear how to compute a BF even for the simple standard designs (Rouder, Morey, Speckman, & Province, 2012). Fortunately, over the last years default Bayes factors for several standard designs have been developed (Rouder et al., 2012; Rouder, Speckman, Sun, Morey, & Iverson, 2009; Morey & Rouder, 2011). For example, for a two-sample t test, a BF can be derived simply by plugging the t value and the sample sizes into a formula. The BF is easy to compute by the R package BayesFactor (Morey & Rouder, 2013), or by online calculators [1][2].

Flavors of Bayes factors

When I started to familiarize myself with BFs, I was sometimes confused, as the same number seemed to mean different things in different publications. And indeed, four types of Bayes factors can be distinguished. “Under the hood”, all four types are identical, but you have to be aware which type has been employed in the specific case.

The first distinction is, whether the BF indicates “H_0 over H_1” (=BF_{01}), or “H_1 over H_0” (=BF_{10}). A BF_{01} of 2 means “Data is 2 times more likely to be occured under H_0 than under H_1“, while the same situation would be a BF_{10} of 0.5 (i.e., the reciprocal value 1/2). Intuitively, I prefer larger values to be “better”, and as I usually would like to have evidence for H_1 instead of H_0, I usually prefer the BF_{10}. However, if your goal is to show evidence for the H0, then BF_{01} is easier to communicate (compare: “Data occured 0.1 more likely under the alternative” vs. “Data show 10 times more evidence for the null than for the alternative”).

The second distinction is, whether one reports the raw BF, or the natural logarithm of the BF (The log(BF) has also been called “weight of evidence“; Good, 1985). The logarithm has the advantage that the scale above 1 (evidence for H_1) is identical to the scale below 1 (evidence for H_0). In the previous example, a BF_{10} of 2 is equivalent to a BF_{01} of 0.5. Taking the log of both values leads to log(BF_{10}) = 0.69 and log(BF_{01}) = -0.69: Same value, reversed sign. This makes the log(BF) ideal for visualization, as the scale is linear in both directions. Following graphic shows the relationship between raw/log BF:

Bayesfactor_Overview1

Figure 1

 

As you can see in the Table of Figure 1, different authors use different flavors. This often makes sense, as we sometimes want to communicate evidence for the H1, and sometimes for the H0. However, for the uninitiated it can be sometimes confusing.

Usually, tables in publication report the raw BF (raw- or raw+). Plots, in contrast, typically use the log scale, for example:

Bildschirmfoto 2014-10-15 um 10.48.25

 

Figure 2 shows conversion paths of the different BF flavors:

Bayesfactor_Overview2

 

 

The user interface functions of the BayesFactor package always print the raw BF_{10}. Internally, however, the BF is stored as log(BF_{10}).

Hence, you have to be careful when you directly use the backend utility functions, such as ttest.tstat. These functions return the log(BF_{10}). As the conversion table shows, you have to exp() that number to get the raw BF. Check the documentation of the functions if you are unsure which flavor is returned.

Related posts: Exploring the robustness of Bayes Factors: A convenient plotting function

References

Berger, J. O. (2006). Bayes factors. In S. Kotz, N. Balakrishnan, C. Read, B. Vidakovic, & N. L. Johnson (Eds.), Encyclopedia of Statistical Sciences, vol. 1 (2nd ed.) (pp. 378–386). Hoboken, NJ: Wiley.
Good, I. J. (1985). Weight of evidence: A brief survey. In J. M. Bernardo, M. H. DeGroot, D. V. Lindley, & A. F. M. Smith (Eds.), Bayesian Statistics 2 (pp. 249–270). Elsevier.

Morey, R. D. & Rouder, J. N. (2011). Bayes factor approaches for testing interval null hypotheses. Psychological Methods, 16(4), 406–419. PMID: 21787084. doi:10.1037/a0024377

Morey, R. D. & Rouder, J. N. (2013). {BAYESFACTOR}: computation of bayes factors for common designs. R package version 0.9.4. Retrieved from http://CRAN.R- project.org/package=BayesFactor

Rouder, J. N., Morey, R. D., Speckman, P. L., & Province, J. M. (2012). Default bayes factors for ANOVA designs. Journal of Mathematical Psychology, 56(5), 356–374. doi:10.1016/j.jmp.2012.08.001

Rouder, J. N., Speckman, P. L., Sun, D., Morey, R. D., & Iverson, G. (2009). Bayesian t tests for accepting and rejecting the null hypothesis. Psychonomic Bulletin & Review, 16(2), 225–237.

Comments (12) | Trackback
Send this to a friend