Two meanings of priors, part I: The plausibility of models

by Angelika Stefan & Felix Schönbrodt

When reading about Bayesian statistics, you regularly come across terms like “objective priors“, “prior odds”, “prior distribution”, and “normal prior”. However, it may not be intuitively clear that the meaning of “prior” differs in these terms. In fact, there are two meanings of “prior” in the context of Bayesian statistics: (a) prior plausibilities of models, and (b) the quantification of uncertainty about model parameters. As this often leads to confusion for novices in Bayesian statistics, we want to explain these two meanings of priors in the next two blog posts*. The current blog post covers the the first meaning of priors (link to part II).

The first meaning of “prior”: Prior plausibility for models

In this context, the term “prior” incorporates the personal assumptions of a researcher on the probability of a hypothesis (p(H1)) relative to a competing hypothesis, which has the probability p(H2). Hence, the meaning of this prior is “how plausible do you deem a model relative to another model before looking at your data”. The ratio of the two priors of the models, that is “how probable do you consider H1 compared to H2”, is called “prior odds”: p(H1) / p(H2).

The first meaning of priors is used in the context of Bayes factor analysis, where you compare two different hypotheses. In Bayes factor analysis, prior odds are updated by the likelihood ratio of the two hypotheses, which contains the information from the data, and result in the posterior odds (“what you believe after looking at your data”):

$\large \frac{Posterior_{H1}}{Posterior_{H2}} =\frac{Likelihood_{H1}}{Likelihood_{H2}}\cdot\frac{Prior_{H1}}{Prior_{H2}}$
The prior belief is called “subjective”, but this label does not imply that it is “arbitrary”, “unprincipled”, or “irrational”. In contrast, the prior belief can (and preferably should) be informed by previous data or experiences. For example, it can be a belief that started with an equipoise (50/50) position, but has been repeatedly updated by data. But within the bounds of rationality and consistency, people still can come to considerably different prior beliefs, and might have good arguments for their choice – that’s why it is called “subjective”. But initially differing degrees of belief will converge as more and more evidence comes in. We will observe this in the following example.

The incredible tea tasting abilities of Lady Muriel Bristol

Fig. 1: Muriel Bristol surprising her husband (William) by making the correct guess for the fifth time in a row (reenacted scene)

The classical experiment of tasting tea has already been described in the context of Bayesian hypothesis testing by Lindley (1993). We will present a simpler form here. Dr. Muriel Bristol, a scientist working in the field of alga biology who was acquainted to the statistician R. A. Fisher, claimed that she could discriminate whether milk is put in a cup before or after the tea infusion during the process of preparing tea with milk. However, Mr. Fisher considered this very unlikely.
So they decided to run an experiment: Muriel Bristol tasted several cups of tea in a row, making a guess on the preparation procedure for each cup. Unlike in the original story, where inferential statistics were consulted to solve the disagreement, we will employ Bayesian statistics to track how prior convictions in this example change. If Muriel Bristol makes her guesses only based on chance as Mr. Fisher supposes, she has a probability of success of 50% in each trial. Before observing her performance, Mr. Fisher should therefore consider it very likely that she is right about the procedure in about 50% of the trials across all trials. We can therefore assume a point hypothesis: H_Fisher: Success rate (SR) = 0.5. Muriel Bristol, on the other hand, is very confident in her divination skills. She claims to get 80% of trials correct, which can be equally captured in a point hypothesis: H_Muriel: SR = 0.8.

Two observers with different prior beliefs

To introduce prior beliefs about hypotheses and show how they change with upcoming evidence, we want to introduce two additional persons. The first one is a slightly skeptical observer who tends to favor H_Fisher, but does not completely rule out that Mrs. Bristol could be right with her hypothesis. More formally, we could describe this position as: P(H_Fisher) = 0.6 and P(H_Muriel) = 0.4. This means that his prior odds are P(H_Fisher)/P(H_Muriel) = 3:2. Fisher’s hypothesis is 1.5 times more likely to him than Muriel Bristol’s hypothesis.
The second additional person we would like to introduce is William, Muriel Bristol’s loving husband who fervently advocates her position. He knows his wife would never (or at least very rarely) make wrong claims, concerning tea preparation and all others issues of their marriage. He therefore assigns a much higher subjective probability to her hypothesis (P(H_Muriel) = 0.9) than to the one of Mr. Fisher (P(H_Fisher) = 0.1). His prior odds are therefore P(H_Fisher)/P(H_Muriel) = 1:9. Please note that the content of the hypotheses (the proposed success rates 0.5 and 0.8, which are the parameters of the model) is logically independent of the probability of the hypotheses (priors) that our two observers have.

How to update prior beliefs about hypotheses

During the process of hypothesis testing, these two priors are updated with the existing evidence. It is reported that Muriel Bristol’s performance at the experiment was extraordinarily good. We therefore assume that out of the first 6 trials of the experiment she got 5 correct.
With this information, we can now compute the likelihood of the data under each of the hypotheses (for more information on the computation of likelihoods, see Alexander Etz’s blog:
$L_{Fisher} = 0.5^5 \cdot 0.5^1 = 0.016$
$L_{Muriel} = 0.8^5 \cdot 0.2^1 = 0.066$
The computation of the likelihoods does not involve the prior model probabilities of our observers. What can be seen is that the data are more likely under Muriel Bristol’s hypothesis than under Mr. Fisher’s. This should not come as a surprise as Muriel Bristol claimed that she could make a very high percentage of right guesses and the data show a very high percentage of right guesses whereas Mr. Fisher assumed a much lower percentage of right guesses. To emphasize this difference in likelihoods and to assign it a numerical value, we can compute the likelihood ratio (Bayes factor):
$BF_{FM} = LR_{FM} = \frac{L_{Fisher}}{L_{Muriel}} = 0.016 / 0.066 = 0.238$
$BF_{MF} = LR_{MF} = \frac{L_{Muriel}}{L_{Fisher}} = 0.066 / 0.016 = 4.19$
This ratio means that the data are 4.19 times as likely under Mrs. Bristol’s hypothesis as under Mr. Fisher’s hypothesis. It does not matter how you order the likelihoods in the fraction, the meaning remains constant (see this blog post).
How does this likelihood change the prior odds of both our slightly skeptical observer and William Bristol? Bayes theorem shows that prior odds can be updated by multiplying them with the likelihood ratio (Bayes factor):
$\frac{ Posterior(H_{Fisher}) } {Posterior(H_{Muriel})} = \frac{Prior(H_{Fisher})}{Prior(H_{Muriel})} \cdot \text {Bayes Factor}$
First, we will focus on the posterior odds of the slightly skeptical observer. To remember, the slightly skeptical observer had assigned a probability of 0.6 to Mr. Fisher’s hypothesis and a probability of 0.4 to Muriel Bristol’s hypothesis before seeing the data, which resulted in prior odds of 3:2 for Mr. Fisher’s hypothesis. How do these convictions change now when the experiment has conducted? To examine this, we simply have to insert all known values in the equation:
$\frac {3}{2} \cdot 0.238 = 0.357 = \frac {1}{2.8}$
This shows that the prior odds of the slightly skeptical observer changed from 3:2 to posterior odds of 1:2.8. This means that whereas before the experiment the slightly skeptical observer had deemed Mr. Fisher’s hypothesis more plausible than Mrs. Bristol’s hypothesis, he changed his opinion after seeing the data, now preferring Mrs. Bristol’s hypothesis over Mr. Fisher’s.
The same equation can be applied to William Bristol’s prior odds:
$\frac {1}{9} \cdot 0.238 = 0.0264 = \frac {1}{37.9}$
What we can notice is that after taking the data into consideration both prior odds display a higher amount of agreement with Muriel Bristol and reduced confidence in Mr. Fisher’s hypothesis. Whereas the convictions of the slightly skeptical observer were changed in favor of Muriel Bristol’s hypothesis after the experiment, William Bristol’s prior convictions were strengthened.
Something else you can notice is that compared to William Bristol the slightly skeptical observer still assigns a higher plausibility to Mr. Fisher’s hypothesis. This rank order between the two priors will remain no matter what the data look like. Even if Muriel Bristol made, say, 100/100 correct guesses, the slightly skeptical observer would trust less in her hypothesis than her husband. However, with increasing evidence the absolute difference between both observers will decrease more and more.

Summary

This blog post explained the first meaning of “prior” in the context of Bayesian statistics. It can be defined as the subjective plausibility a researcher assigns to a hypothesis compared to another hypothesis before seeing the data. As illustrated in the tea-tasting example, these prior beliefs are updated with upcoming evidence in the research process. In the next blog post, we will explain a second meaning of “priors”: The quantification of uncertainty about model parameters.
Continue reading part II: Quantifying uncertainty about model parameters
We want to thank Eric-Jan Wagenmakers for helpful comments on a previous version of the post.
*As a note: Both meanings in fact can be unified, but for didactic purpose we think it makes sense to keep them distinct as a start.

References

Dienes, Z. (2011). Bayesian versus orthodox statistics: Which side are you on?. Perspectives On Psychological Science, 6(3), 274-290. http://doi:10.1177/1745691611406920
Dienes, Z. (2016). How Bayes factors change scientific practice. Journal Of Mathematical Psychology, 7278-89. http://doi:10.1016/j.jmp.2015.10.003
Lindley, D. V. (1993). The analysis of experimental data: The appreciation of tea and wine. Teaching Statistics, 15(1), 22-25. http://dx.doi.org/10.1111/j.1467-9639.1993.tb00252.x
Rouder, J. N., Morey, R. D., Verhagen, J., Province, J. M., & Wagenmakers, E. J. (2016a). Is there a free lunch in inference? Topics in Cognitive Science, 8, 520–547. http://doi.org/10.1111/tops.12214
Rouder, J. N., Morey, R. D., & Wagenmakers, E. J. (2016b). The Interplay between Subjectivity, Statistical Practice, and Psychological Science. Collabra, 2(1), 6–12. http://doi.org/10.1525/collabra.28

4 thoughts on “Two meanings of priors, part I: The plausibility of models”

Pingback: Two meanings of priors, part I: The plausibility of models | A bunch of data
Pingback: Two meanings of priors, part I: The plausibility of models - Use-R!Use-R!
Pingback: Two meanings of priors, part I: The plausibility of models – Mubashir Qasim
Pingback: Felix Schönbrodt's blog