-
Notifications
You must be signed in to change notification settings - Fork 0
/
14-prior-probabilities.qmd
422 lines (320 loc) · 18.1 KB
/
14-prior-probabilities.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
# Parameter Estimation with Prior Probabilities
## Predicting Email Conversion Rates
> Our data so far tells us that of the first five people that open an
> email, two of them click the link.
![The beta distribution for our observations so
far](img/14fig01.jpg){#fig-14-01
fig-alt="Beta curve with wide range of possible values. Modus of the conversion rate is about 30-35%"
fig-align="center" width="70%"}
> Unlike in the previous chapter, where we had a pretty narrow spike in
> possible values, here we have a huge range of possible values for the
> true conversion rate because we have very little information to work
> with.
> The 95 percent confidence interval (i.e., a 95 percent chance that our
> true conversion rate is somewhere in that range) is marked to make it
> easier to see. At this point our data tells us that the true
> conversion rate could be anything between 0.05 and 0.8! This is a
> reflection of how little information we've actually acquired so far.
![CDF for our observation](img/14fig02.jpg){#fig-14-02
fig-alt="Beta curve with modus about a conversion rate of about 30-35%"
fig-align="center" width="70%"}
## Taking in Wider Context with Priors
This data (80% conversion rate) seems unrealistic.
> Let's look at some wider context. For blogs listed in the same
> category as yours, the provider's data claims that on average only 2.4
> percent of people who open emails click through to the content.
We learned about producing the `r glossary("posterior probability")`
with the combination of `r glossary("prior probability")` and
`r glossary("likelihood")` distribution in @sec-chap-09. The prior will
change if new data (likelihood) come in. This process is called
`r glossary("Bayesian updating")`.
> As you know by now, in Bayesian terms the data we have observed is our
> likelihood, and the external context information---in this case from
> our personal experience and our email service---is our prior
> probability. Our challenge now is to figure out how to model our
> prior.
> The conversion rate of 2.4 percent from your email provider gives us a
> starting point: now we know we want a beta distribution whose mean is
> roughly 0.024. (The mean of a beta distribution is
> \$\frac{\alpha}{\alpha + \beta}.) However, this still leaves us with a
> range of possible options: Beta(1,41), Beta(2,80), Beta(5,200),
> Beta(24,976), and so on. So which should we use? Let's plot some of
> these out and see what they look like (see @fig-14-03).
::: callout-note
In the meanwhile I learned from [@kruschke2014] that the calculation of
different forms of the beta distribution is very important to model your
prior beliefs. I think there are several R packages and Kruschke is also
offering some scripts for this task. It also important to get a general
impression how $\alpha$ and $\beta$ work together to form very different
beta distribution.
:::
![Comparing different possible prior
probabilities](img/14fig03.jpg){#fig-14-03
fig-alt="Three different beta distributions" fig-align="center"
width="70%"}
::: {.callout-important}
The lower the combined $\alpha + \beta$, the wider our distribution.
:::
> The problem now is that even the most liberal option we have, Beta(1,41), seems a little too pessimistic, as it puts a lot of our probability density in very low values. We’ll stick with this distribution nonetheless, since it is based on the 2.4 percent conversion rate in the data from the email provider, and is the weakest of our priors.
Why not use Beta(2, 80)? It is not so weak but still not very commiting. In my opinion it seems better providing the 2.4 percent conversion rate.
> we can calculate our posterior distribution (the combination of our likelihood and our prior) by simply adding together the parameters for the two beta distributions:
$$
\begin{align*}
Beta(\alpha_{posterior}, \beta_{posterior}) = Beta(\alpha_{likelihood} + \alpha_{prior}, \beta_{likelihood} + \beta_{prior})
\end{align*}
$$ {#eq-add-two-beta}
![Comparing our likelihood (no prior) to our posterior (with prior)](img/14fig04.jpg){#fig-14-04
fig-alt="Two curves: A flat but wide likelihood curve, stretching until 80% (!) and curve for the posterior probability including the prior probability with an extreme spike at about 0.8%" fig-align="center"
width="70%"}
This is a big surprise!
> Even though we’re working with a relatively weak prior, we can see that it has made a huge impact on what we believe are realistic conversion rates. Notice that for the likelihood with no prior, we have some belief that our conversion rate could be as high as 80 percent. … Adding a prior to our likelihood adjusts our beliefs so that they become much more reasonable. But I still think our updated beliefs are a bit pessimistic.
> We wait a few hours to gather more results and now find that out of 100 people who opened your email, 25 have clicked the link!
![Updating our beliefs with more data](img/14fig05.jpg){#fig-14-05
fig-alt="Two spiked curves: The likelihood curve (without prior) has a smaller spike at about 30% in contrast to the higher spike of the posterior probability (with prior) at about 15-18%" fig-align="center"
width="70%"}
> In the morning we find that 300 subscribers have opened their email, and 86 of those have clicked through. @fig-14-06 shows our updated beliefs.
![Our posterior beliefs with even more data added](img/14fig06.jpg){#fig-14-06
fig-alt="Two curves now side by side. The likelihood curve is still more optimistic but the difference to the posterior probability is relatively small (about 30% versus 25%)" fig-align="center"
width="70%"}
Now the two curves are side by side. The likelihood curve is still more optimistic but the difference to the posterior probability is relatively small by visual inspection (about 30% versus 25%).
## Prior as a Means of Quantifying Experience
### Is There a Fair Prior to Use When We Know Nothing?
Will Kurt’s answer is: No!
> In the absence of sufficient data and any prior information, your only honest option is to throw your hands in the air and tell your friend, “I have no clue how to even reason about that question!”
In the [literature](https://www.sciencedirect.com/topics/mathematics/noninformative-prior) there is much talk about a `r glossary("non-informative prior")` with $Beta(1,1)$
![The noninformative prior Beta(1,1)](img/14fig07.jpg){#fig-14-07
fig-alt="A perfectly straight line, with the mean likelihood of 0.5" fig-align="center"
width="70%"}
> As you can see, this is a perfectly straight line, so that all outcomes are then equally likely and the mean likelihood is 0.5.
$$\text{mean of likelihood} = \frac{\alpha}{\alpha + \beta} = \frac{1}{1 + 1} = \frac{1}{2} = 0.5$$ {#eq-mean-noninformative-prior}
::: {.callout-note}
There is much discussion on this topic. People arguing that there is not such thing as a non-informative prior. I want not to go into further details here.
:::
> A Beta(1,1) prior is sometimes used in practice, but you should use it only when you earnestly believe that the two possible outcomes are, as far as you know, equally likely.
## Wrapping Up
> Whenever possible, it’s best to use a prior probability distribution based on actual data. However, often we won’t have data to support our problem, but we either have personal experience or can turn to experts who do. In these cases, it’s perfectly fine to estimate a probability distribution that corresponds to your intuition. Even if you’re wrong, you’ll be wrong in a way that is recorded quantitatively. Most important, even if your prior is wrong, it will eventually be overruled by data as you collect more observations.
## Exercises
Try answering the following questions to see how well you understand priors. The solutions can be found at https://nostarch.com/learnbayes/.
### Exercise 14-1
Suppose you’re playing air hockey with some friends and flip a coin to see who starts with the puck. After playing 12 times, you realize that the friend who brings the coin almost always seems to go first: 9 out of 12 times. Some of your other friends start to get suspicious. Define prior probability distributions for the following beliefs:
- One person who weakly believes that the friend is cheating and the true rate of coming up heads is closer to 70 percent.
- One person who very strongly trusts that the coin is fair and provided a 50 percent chance of coming up heads.
- One person who strongly believes the coin is biased to come up heads 70 percent of the time.
### Exercise 14-2
To test the coin, you flip it 20 more times and get 9 heads and 11 tails. Using the priors you calculated in the previous question, what are the updated posterior beliefs in the true rate of flipping a heads in terms of the 95 percent confidence interval?
## Experiments
### Replicate Figure 14-1
Hover the cursor over @fig-14-01 to compare both plots!
```{r}
#| label: fig-pb-14-1
#| fig-cap: "The beta distribution for our observations so far"
#| attr-source: '#lst-fig-pb-14-1 lst-cap="Draw beta distribution for our observations so far"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y = dbeta(x, 2, 3)) |>
ggplot2::ggplot(ggplot2::aes(x = x, y = y)) +
ggplot2::geom_line() +
ggplot2::theme_bw() +
ggplot2::labs(
title = "Beta(2, 3) likelihood for possible conversion rates",
x = "Conversion rate",
y = "Density"
)
```
### Replicate Figure 14-2
Hover the cursor over @fig-14-02 to compare both plots!
```{r}
#| label: fig-pb-14-2
#| fig-cap: "CDF for our observation"
#| attr-source: '#lst-fig-pb-14-2 lst-cap="Draw CDF with 95 percent credible interval"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y = pbeta(x, 2, 3)) |>
ggplot2::ggplot(ggplot2::aes(x = x, y = y)) +
ggplot2::geom_line() +
ggplot2::geom_segment(
ggplot2::aes(x = 0, y = 0.025, xend = 1, yend = 0.025),
linewidth = 0.4, linetype = "dashed", color = "steelblue") +
ggplot2::geom_segment(
ggplot2::aes(x = 0, y = 0.975, xend = 1, yend = 0.975),
linewidth = 0.4, linetype = "dashed", color = "steelblue") +
ggplot2::theme_bw() +
ggplot2::labs(
title = "CDF for Beta(2, 3)",
x = "Conversion rate",
y = "Density"
)
```
### Replicate Figure 14-3
Hover the cursor over @fig-14-03 to compare both plots!
```{r}
#| label: fig-pb-14-3
#| fig-cap: "Comparing different possible prior probabilities"
#| attr-source: '#lst-fig-pb-14-3 lst-cap="Draw different possible prior probabilities"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 0.15, .0001),
y1 = dbeta(x, 1, 41),
y2 = dbeta(x, 2, 80),
y3 = dbeta(x, 5, 200)) |>
ggplot2::ggplot() +
ggplot2::geom_line(ggplot2::aes(x = x, y = y1, color = "Beta(1, 41)"), linewidth = 1.0) +
ggplot2::geom_line(ggplot2::aes(x = x, y = y2, color = "Beta(2, 80)"), linewidth = 1.0) +
ggplot2::geom_line(ggplot2::aes(x = x, y = y3, color = "Beta(5, 200)"), linewidth = 1.0) +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "top") +
ggplot2::scale_color_manual(name = "Distributions:",
values = c("Beta(1, 41)" = "red",
"Beta(2, 80)" = "blue",
"Beta(5, 200)" = "black")) +
ggplot2::labs(
title = "Possible priors for email conversion rate",
x = "Conversion rate",
y = "Density"
)
```
### Replicate Figure 14-4
We need to calculate the posterior distribution (the combination of our likelihood and our prior) by simply adding together the parameters for the two beta distributions. Remember: We started with five people that opened the email, where two of them click the link we provided.
We are going to use @eq-add-two-beta to compute the posterior distribution.
$$
\begin{align*}
Beta(\alpha_{posterior}, \beta_{posterior}) = Beta(\alpha_{likelihood} + \alpha_{prior}, \beta_{likelihood} + \beta_{prior}) \\
Beta(\alpha_{posterior}, \beta_{posterior}) = Beta(2 + 1, 3 + 41) = Beta(3, 44) \\
Beta(\alpha_{likelihood} + \beta_{likelihood}) = Beta(2, 3)
\end{align*}
$$ {#eq-calc-posterior}
Summarized we got:
- Prior: Beta(1,41)
- Likelihood: Beta(2, 3)
- Posterior: Beta(2 + 1, 3 + 41) = Beta(3, 44)
Hover the cursor over @fig-14-04 to compare both plots!
```{r}
#| label: fig-pb-14-4
#| fig-cap: "Comparing our likelihood (no prior) to our posterior (with prior)"
#| attr-source: '#lst-fig-pb-14-4 lst-cap="Draw likelihood (no prior) and posterior (with prior)"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y1 = dbeta(x, 2, 3),
y2 = dbeta(x, 3, 44)) |>
ggplot2::ggplot() +
ggplot2::geom_line(ggplot2::aes(x = x, y = y1, color = "No prior"), linewidth = 1.0) +
ggplot2::geom_line(ggplot2::aes(x = x, y = y2, color = "With prior"), linewidth = 1.0) +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "top") +
ggplot2::scale_color_manual(name = "Distributions:",
values = c("With prior" = "black",
"No prior" = "red")) +
ggplot2::guides(colour = ggplot2::guide_legend(reverse = TRUE)) +
ggplot2::labs(
title = "Estimates of conversion rate with and without prior",
x = "Conversion rate",
y = "Density"
)
```
### Replicate Figure 14-5
> We wait a few hours to gather more results and now find that out of 100 people who opened your email, 25 have clicked the link!
I interpret these figures as the total of people, e.g, including the 5 people from the first count.
This time our new prior is the previous posterior and we are have to add the new prior to the new likelihood:
- New prior is the previous posterior: Beta(3, 44)
- New Likelihood: Beta(25, 75)
- New Posterior: Beta(25 + 3, 75 + 44) = Beta(28, 119)
Hover the cursor over @fig-14-05 to compare both plots!
```{r}
#| label: fig-pb-14-5
#| fig-cap: "Updating our beliefs with more data"
#| attr-source: '#lst-fig-pb-14-5 lst-cap="Update likelihood (no prior) and posterior (with prior)"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y1 = dbeta(x, 25, 75),
y2 = dbeta(x, 28, 119)) |>
ggplot2::ggplot() +
ggplot2::geom_line(ggplot2::aes(x = x, y = y1, color = "No prior"), linewidth = 1.0) +
ggplot2::geom_line(ggplot2::aes(x = x, y = y2, color = "With prior"), linewidth = 1.0) +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "top") +
ggplot2::scale_color_manual(name = "Distributions:",
values = c("With prior" = "black",
"No prior" = "red")) +
ggplot2::guides(colour = ggplot2::guide_legend(reverse = TRUE)) +
ggplot2::labs(
title = "Estimates of conversion rate with and without prior",
x = "Conversion rate",
y = "Density"
)
```
::: {.callout-warning}
The red beta distribution without prior does not conform to @fig-14-05. The spike should be around 0.30 and should be just under density 8. But as far as I understand my calculation is correct. I would get a similar distribution with Beta around (25, 57), but I do not know how to get this result. My relation is $25 / 75 = 0.33$ but should have a relation of about $25 / 57 = 0.44$. I do not know how to produce this relation with an arguable calculation. Even a total of 105 people with a Beta(27, 78) would not change much (0.35)
:::
### Replicate Figure 14-6
In the morning we find that 300 subscribers have opened their email, and 86 of those have clicked through.
I will stick with my calculation procedure, even if I do not get exact the same result as in the book.
- New prior is the previous posterior: Beta(28, 119)
- New Likelihood: Beta(86, 214)
- New Posterior: Beta(86 + 28, 214 + 119) = Beta(114, 333)
Hover the cursor over @fig-14-06 to compare both plots!
```{r}
#| label: fig-pb-14-6
#| fig-cap: "The beta distribution for our observations so far"
#| attr-source: '#lst-fig-pb-14-6 lst-cap="Draw beta distribution for our observations so far"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y1 = dbeta(x, 86, 214),
y2 = dbeta(x, 114, 333)) |>
ggplot2::ggplot() +
ggplot2::geom_line(ggplot2::aes(x = x, y = y1, color = "No prior"), linewidth = 1.0) +
ggplot2::geom_line(ggplot2::aes(x = x, y = y2, color = "Width prior"), linewidth = 1.0) +
ggplot2::theme_bw() +
ggplot2::theme(legend.position = "top") +
ggplot2::scale_color_manual(name = "Distributions:",
values = c("Width prior" = "black",
"No prior" = "red")) +
ggplot2::guides(colour = ggplot2::guide_legend(reverse = TRUE)) +
ggplot2::labs(
title = "Estimates of conversion rate with and without prior",
x = "Conversion rate",
y = "Density"
)
```
This time my graph looks pretty similar as @fig-14-06!
### Replicate Figure 14-7
Hover the cursor over @fig-14-07 to compare both plots!
```{r}
#| label: fig-pb-14-7a
#| fig-cap: "The noninformative prior Beta(1,1)"
#| attr-source: '#lst-fig-pb-14-7 lst-cap="Draw non-informative prior Beta(1,1)"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y = dbeta(x, 1, 1)) |>
ggplot2::ggplot(ggplot2::aes(x = x, y = y)) +
ggplot2::geom_line() +
ggplot2::theme_bw() +
ggplot2::labs(
title = "Non-inforamtive prior Beta(1, 1)",
x = "Conversion rate",
y = "Density"
)
```
Drawing a straight line is also done with `dunif(x, min = 0, max = 1)` (= the uniform distribution). As `min = 0, max = 1`are the default values you just can say `dunif(x)` as @fig-pb-14-7b demonstrates:
```{r}
#| label: fig-pb-14-7b
#| fig-cap: "The noninformative with the uniform distribution"
#| attr-source: '#lst-fig-pb-14-7b lst-cap="Draw non-informative with the uniform distribution"'
#| out.width: 70%
#| fig.align: center
tibble::tibble(x = seq(0, 1, .0001),
y = dunif(x)) |>
ggplot2::ggplot(ggplot2::aes(x = x, y = y)) +
ggplot2::geom_line() +
ggplot2::theme_bw() +
ggplot2::labs(
title = "Non-inforamtive prior with dunif(x, min = 0, max = 1)",
x = "Conversion rate",
y = "Density"
)
```