04-geocentric-models.qmd

---
format: html
execute: 
  cache: true
filters: 
  - quarto
  - nameref
---

# Geocentric Models {#sec-chap04}

::: my-objectives
::: my-objectives-header
Learning Objectives:
:::

::: my-objectives-container
> "This chapter introduces `r glossary("linear regression")` as a
> Bayesian procedure. Under a probability interpretation, which is
> necessary for Bayesian work, linear regression uses a Gaussian
> (normal) distribution to describe our `r glossary("golem")`'s
> uncertainty about some measurement of interest." ([McElreath, 2020, p.
> 71](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=90&annotation=PHU5R9MI))
:::
:::


## Why normal distributions? {#sec-why-normal-dist-a}

Why are there so many distribution approximately normal, resulting in a
Gaussian curve? Because there will be more combinations of outcomes that
sum up to a "central" value, rather than to some extreme value.

::: my-important
::: my-important-header
Why are normal distributions normal?
:::

::: my-important-container
> "Any process that adds together random values from the same
> distribution converges to a normal." ([McElreath, 2020, p.
> 73](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=92&annotation=F8VYZH4I))
:::
:::

### Normal by addition

> "Whatever the average value of the source distribution, each sample
> from it can be thought of as a fluctuation from that average value.
> When we begin to add these fluctuations together, they also begin to
> cancel one another out. A large positive fluctuation will cancel a
> large negative one. The more terms in the sum, the more chances for
> each fluctuation to be canceled by another, or by a series of smaller
> ones in the opposite direction. So eventually the most likely sum, in
> the sense that there are the most ways to realize it, will be a sum in
> which every fluctuation is canceled by another, a sum of zero
> (relative to the mean)." ([McElreath, 2020, p.
> 73](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=92&annotation=PIK9MN9I))

> "It doesn't matter what shape the underlying distribution possesses.
> It could be uniform, like in our example above, or it could be
> (nearly) anything else. Depending upon the underlying distribution,
> the convergence might be slow, but it will be inevitable. Often, as in
> this example, convergence is rapid." ([McElreath, 2020, p.
> 74](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=93&annotation=GLCXRF8V))

::: my-resource
::: my-resource-header
Resources: Why normal distributions?
:::

::: my-resource-container
See the excellent article [Why is normal distribution so
ubiquitous?](https://ekamperi.github.io/mathematics/2021/01/29/why-is-normal-distribution-so-ubiquitous.html)
which also explains the example of random walks from SR2. See also the
scientific paper [Why are normal distribution
normal?](https://www.journals.uchicago.edu/doi/pdf/10.1093/bjps/axs046)
of the The British Journal for the Philosophy of Science.
:::
:::

### Normal by multiplication

This is not only valid for addition but also for multiplication of small
values: Multiplying small numbers is approximately the same as addition.

### Normal by log-multipliation

But even the multiplication of large values tend to produce Gaussian
distributions on the log scale.

### Using Gaussian distributions

The justifications for using the Gaussian distribution fall into two
broad categories:

1.  **Ontological justification**: The Gaussian distributions is a
    widespread pattern, appearing again and again at different scales
    and in different domains.
2.  **Epistemological justification**: When all we know is the mean and
    variance of a distribution then the Gaussian distribution arises as
    the most consistent with these assumptions.

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Many processes have heavier tails than the Gaussian
distribution
:::

::: my-watch-out-container
> "... the Gaussian distribution has some very thin tails---there is
> very little probability in them. Instead most of the mass in the
> Gaussian lies within one standard deviation of the mean. Many natural
> (and unnatural) processes have much heavier tails." ([McElreath, 2020,
> p. 76](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=95&annotation=LFRDF45M))
:::
:::

::: my-definition
::: my-definition-header
::: {#def-mass-density}
: Probability mass and probability density
:::
:::

::: my-definition-container
-   Probability distributions with only discrete outcomes, like the
    binomial, are called `r glossary("probability mass function")`s and
    denoted `Pr`.

-   Continuous ones like the Gaussian are called
    `r glossary("probability density function")`s, denoted with $p$ or
    just plain old $f$, depending upon author and tradition.

> "Probability *density* is the rate of change in cumulative
> probability. So where cumulative probability is increasing rapidly,
> density can easily exceed 1. But if we calculate the area under the
> density function, it will never exceed 1. Such areas are also called
> *probability mass*." ([McElreath, 2020, p.
> 76](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=95&annotation=577IRETG))

For example `dnorm(0,0,0.1)` which is the way to make R calculate
$p(0 \mid 0, 0.1)$ results to `r dnorm(0,0,0.1)`.
:::
:::

> "The Gaussian distribution is routinely seen without σ but with
> another parameter, $\tau$ . The parameter $\tau$ in this context is
> usually called *precision* and defined as $\tau = 1/σ^2$. When
> $\sigma$ is large, $\tau$ is small." ([McElreath, 2020, p.
> 76](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=95&annotation=4UW673GU))

> "This form is common in Bayesian data analysis, and Bayesian model
> fitting software, such as `r glossary("BUGS")` or
> `r glossary("JAGS")`, sometimes requires using $\tau$ rather than
> $\sigma$." ([McElreath, 2020, p.
> 76](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=95&annotation=9VQCJW2G))

## Model describing language

::: my-procedure
::: my-procedure-header
::: {#prp-model-language}
: General receipt for describing models
:::
:::

::: my-procedure-container
1.  First, we recognize a set of variables to work with. Some of these
    variables are observable. We call these data. Others are
    unobservable things like rates and averages. We call these
    parameters.
2.  We define each variable either in terms of the other variables or in
    terms of a probability distribution.
3.  The combination of variables and their probability distributions
    defines a joint generative model that can be" ([McElreath, 2020, p.
    77](zotero://select/groups/5243560/items/NFUEVASQ))
    ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=96&annotation=IC4KGA5D))
:::
:::

::: my-definition
::: my-definition-header
::: {#def-model}
: What are statistical models?
:::
:::

::: my-definition-container
`r glossary("Statistical Model", "Models")` are "mappings of one set of
variables through a probability distribution onto another set of
variables." ([McElreath, 2020, p.
77](zotero://select/groups/5243560/items/NFUEVASQ))
([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=96&annotation=NSVA3R2F))
:::
:::

### Re-describing the globe tossing model

::: my-example
::: my-example-header
::: {#exm-formula-glob-tossing-model}
: Describe the globe tossing model from @sec-chap03
:::
:::

::: my-example-container
$$
\begin{align*}
W \sim \operatorname{Binomial}(N, p) \space \space (1)\\
p \sim \operatorname{Uniform}(0, 1)  \space \space (2)
\end{align*}
$$ {#eq-globe-tossing-model}

-   `W`: observed count of water
-   `N`: total number of tosses
-   `p`: proportion of water on the globe

The first line in these kind of models always defines the likelihood
function used in `r glossary("Bayes’ theorem")`. The other lines define
priors.

Read the above statement as:

1.  **First line**: The count W is distributed binomially with sample
    size `N` and probability `p`.
2.  **Second line**: The prior for `p` is assumed to be uniform between
    zero and one.

------------------------------------------------------------------------

Both of the lines in the model of @eq-globe-tossing-model are
`r glossary("stochastic")`, as indicated by the `~` symbol. A stochastic
relationship is just a mapping of a variable or parameter onto a
distribution. It is stochastic because no single instance of the
variable on the left is known with certainty. Instead, the mapping is
probabilistic: Some values are more plausible than others, but very many
different values are plausible under any model. Later, we'll have models
with deterministic definitions in them.
:::
:::

## Gaussian model of height {#sec-gaussian-model-of-height}

In this section we want a single measurement variable to model as a
Gaussian distribution. It is a preparation for the linear regression
model in @sec-linear-prediction-a where we will construct and add a
predictor variable to the model.

> "There will be two parameters describing the distribution's shape, the
> `r glossary("arithmetic mean", "mean")` `μ` and the
> `r glossary("standard deviation")` `σ`.
> `r glossary("Bayesian updating")` will allow us to consider every
> possible combination of values for μ and σ and to score each
> combination by its relativ // plausibility, in light of the data.
> These relative plausibilities are the
> `r glossary("posterior probability", "posterior probabilities")` of
> each combination of values μ, σ." ([McElreath, 2020, p.
> 78/79](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=98&annotation=RAYXZIGU))

### The data

::: my-resource
::: my-resource-header
Resource: Nancy Howell data
:::

::: my-resource-container
-   Howell, N. (2001). Demography of the Dobe! Kung (2nd ed.).
    Routledge.
-   Howell, N. (2010). Life Histories of the Dobe !kung: Food, Fatness,
    and Well-being over the Life Span: Food, Fatness, and Well-Being
    Over the Life-Span Volume 4. University of California Press.

The data contained in `data(Howell1)` are partial census data for the
Dobe area !Kung San, compiled from interviews conducted by Nancy Howell
in the late 1960s.

Much more raw data is available for download from the [University of
Toronto
Library](https://tspace.library.utoronto.ca/simple-search?query=nancy+howell&filter_field_1=author&filter_type_1=equals&filter_value_1=Howell%2C+Nancy&sort_by=score&order=desc&rpp=10&etal=0&start=0)

> "For the non-anthropologists reading along, the !Kung San are the most
> famous foraging population of the twentieth century, largely because
> of detailed quantitative studies by people like Howell." ([McElreath,
> 2020, p. 79](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=98&annotation=XUZLTB86))
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Loading data without attaching the package with `library()`
:::

::: my-watch-out-container
::: panel-tabset
###### Standard

Loading data from a package with the `data()` function is only possible
if you have already loaded the package.

::: my-r-code
::: my-r-code-header
::: {#cnj-standard-data-loading}
: Data loading from a package -- Standard procedure
:::
:::

::: my-r-code-container
```{r}
#| label: loading-data-from-package1_a
#| eval: false


## R code 4.7 not executed! #######################
# library(rethinking)
# data(Howell1)
# d_a <- Howell1

```

The standard loading of data from packages with

```         
`library(rethinking)`
`data(Howell1)`
```

is in this book not executed: I want to prevent clashes with loading
{**rethinking**} and {**brms**} at the same time, because of their
similar functions.
:::
:::

###### Unusual (Original)

Because of many function name conflicts with {**brms**} I do not want to
load {**rethinking**} and will call the function of these conflicted
packages with `<package name>::<function name>()` Therefore I have to
use another, not so usual loading strategy of the data set.

::: my-r-code
::: my-r-code-header
::: {#cnj-unusual-data-loading-a}
a: Data loading from a package -- Unusual procedure (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: loading-data-from-package2_a

data(package = "rethinking", list = "Howell1")
d_a <- Howell1
head(d_a)
```
:::
:::

The advantage of this unusual strategy is that I have not always to
detach the {**rethinking**} package and to make sure {**rethinking**} is
detached before using {**brms**} as it is necessary in the Kurz's
{**tidyverse**} / {**brms**} version.

###### Unusual (Tidyverse)

Because of many function name conflicts with {**brms**} I do not want to
load {**rethinking**} and will call the function of these conflicted
packages with `<package name>::<function name>()` Therefore I have to
use another, not so usual loading strategy of the data set.

::: my-r-code
::: my-r-code-header
::: {#cnj-unusual-data-loading-b}
b: Data loading from a package -- Unusual procedure (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: loading-data-from-package_2b


data(package = "rethinking", list = "Howell1")
d_b <- tibble::as_tibble(Howell1)
head(d_b)
```
:::
:::

The advantage of this unusual strategy is that I have not always to
detach the {**rethinking**} package and to make sure {**rethinking**} is
detached before using {**brms**} as it is necessary in the Kurz's
{**tidyverse**} / {**brms**} version.
:::
:::
:::

#### Show the data

::: my-example
::: my-example-header
::: {#exm-show-data}
: Show and inspect the data
:::
:::

::: my-example-container
::: panel-tabset
###### str()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-str-a}
a: Compactly Display the Structure of an Arbitrary R Object (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: show-howell-data-str-a
#| results: hold

## R code 4.8 ####################
str(d_a)
```

`utils::str()` displays compactly the internal **str**ucture of any
reasonable R object.

Our Howell1 data contains four columns. Each column has 544 entries, so
there are 544 individuals in these data. Each individual has a recorded
height (centimeters), weight (kilograms), age (years), and "maleness" (0
indicating female and 1 indicating male).
:::
:::

###### precis()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-precis-a}
a: Displays concise parameter estimate information for an existing model
fit (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: show-howell-data-precis-a
#| results: hold

## R code 4.9 ###################
rethinking::precis(d_a)
```

`rethinking::precis()` creates a table of estimates and standard errors,
with optional confidence intervals and parameter correlations.

In this case we see the mean, the standard deviation, the width of a 89%
posterior interval and a small histogram of four variables: height
(centimeters), weight (kilograms), age (years), and "maleness" (0
indicating female and 1 indicating male).

Additionally there is also a console output. In our case:
`'data.frame': 544 obs. of 4 variables:` .
:::
:::

###### glimpse()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-glimpse-b}
b: Get a glimpse of your data (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: show-howell-data-glimpse-b

d_b |>
    dplyr::glimpse()
```

`pillar::glimpse()` is re-exported by {**dplyr**} and is the tidyverse
analogue for `str()`. It works like a transposed version of `print()`:
columns run down the page, and data runs across.

`dplyr::glimpse()` shows that the Howell1 data contains four columns.
Each column has 544 entries, so there are 544 individuals in these data.
Each individual has a recorded height (centimeters), weight (kilograms),
age (years), and "maleness" (0 indicating female and 1 indicating male).
:::
:::

###### summary()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-summary-b}
: Object summaries (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: show-howell-data-summary-b

d_b |> 
    base::summary()
```

Kurz tells us that the {**brms**} package does not have a function that
works like `rethinking::precis()` for providing numeric and graphical
summaries of variables, as seen in @cnj-show-data-precis-a Kurz suggests
therefore to use `base::summary()` to get some of the information from
`rethinking::precis()`.
:::
:::

###### skim()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-skim-b}
b: Skim a data frame, getting useful summary statistics
:::
:::

::: my-r-code-container
I think `skimr::skim()` is a better option as an alternative to
`rethinking::precis()` as `base::summary()` because it also has a
graphical summary of the variables. {**skimr**} has many other useful
functions and is very adaptable. I propose to install and to try it out.

```{r}
#| label: show-howell-data-skim-b

d_b |>            
    skimr::skim() 
```
:::
:::

###### slice_sample()

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-show-data-slice-sample}
: Show a random number of data records
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: chap04-show-data-slice-sample

set.seed(4)
d_b |> 
  dplyr::slice_sample(n = 6)
```

::::
:::::


###### as_tbl_obs()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-as-tbl-obs-b}
b: Randomly select a small number of observations and put it into
`knitr::kable()`
:::
:::

::: my-r-code-container
```{r}
#| label: show-howell-data-as-tbl-obs-b
#| warning: false

set.seed(4)
d_b |>            
    bayr::as_tbl_obs() 
```

I just learned another method to print variables from a data frame. In
base R there is `utils::head()` and `utils::tail()` with the
disadvantage that the start resp. the end of data file could be atypical
for the variable values. The standard tibble printing method has the
same problem. In contrast `bayr::as_tbl_obs()` prints a random selection
of maximal 8 rows as a compact and nice output, that works on both,
console and {**knitr**} output.

Although `bayr::as_tbl_obs()` does not give a data *summary* as
discussed here in @exm-show-data but I wanted mention this printing
method as I have always looked for an easy way to display a
representative sample of some values of data frame.
:::
:::

###### print()

::: my-r-code
::: my-r-code-header
::: {#cnj-show-data-print-as-tibble-b}
b: Show data with the internal printing method of tibbles
:::
:::

::: my-r-code-container
```{r}
#| label: show-howell-data-print-as-tibble-b

print(d_b, n = 10)
```

Another possibility is to use the `tbl_df` internal printing method, one
of the main features of tibbles. Printing can be tweaked for a one-off
call by calling `print()` explicitly and setting arguments like $n$ and
$width$. More persistent control is available by setting the options
described in `pillar::pillar_options`.

Again this printing method does not give a data summary as is featured
in @exm-show-data. But it is an easy method -- especially as you are
already working with tibbles -- and sometimes this method is enough to
get a sense of the data.
:::
:::
:::
:::
:::

#### Select the height data of adults

> "All we want for now are heights of adults in the sample. The reason
> to filter out nonadults for now is that height is strongly correlated
> with age, before adulthood." ([McElreath, 2020, p.
> 80](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=99&annotation=VCYRP6W4))

::: my-example
::: my-example-header
::: {#exm-adult-data}
: Select the height data of adults (individuals older or equal than 18
years)
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-adult-data-a}
a: Select individuals older or equal than 18 years (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: select-height-adults-a

## R code 4.11a ###################
d2_a <- d_a[d_a$age >= 18, ]
str(d2_a)
```
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-adult-data-b}
b: Select individuals older or equal than 18 years (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: select-height-adults-b

## R code 4.11b ###################
d2_b <- 
  d_b |> 
  dplyr::filter(age >= 18) 

dplyr::glimpse(d2_b)
```
:::
:::
:::
:::
:::

### The model

Our goal is to model the data using a Gaussian distribution.

#### Plot the distribution of heights

::: my-example
::: my-example-header
::: {#exm-plot-height-dist}
: Plotting the distribution of height
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-plot-height-dist-a}
a: Plot the distribution of the heights of adults, overlaid by an ideal
Gaussian distribution (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-dist-heights-a
#| fig-cap: "The distribution of the heights data, overlaid by an ideal Gaussian distribution (Original)"

rethinking::dens(d2_a$height, adj = 1, norm.comp = TRUE)
```
:::
:::

With the option `norm.comp = TRUE` I have overlaid a Gaussian
distribution to see the differences to the actual data. There are some
differences locally, especially on the peak of the distribution. But the
tails looks nice and we can say that the overall impression of the curve
is Gaussian.

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-plot-height-dist-b}
b: Plot the distribution of the heights of adults, overlaid by an ideal
Gaussian distribution (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-dist-heights-b
#| fig-cap: "The distribution of the heights data, overlaid by an ideal Gaussian distribution: tidyverse version"

d2_b |> 
    ggplot2::ggplot(ggplot2::aes(height)) +
    ggplot2::geom_density() +

    ggplot2::stat_function(
        fun = dnorm,
        args = with(d2_b, c(mean = mean(height), sd = sd(height)))
        ) +
    ggplot2::labs(
      x = "Height in cm",
      y = "Density"
    ) +
    ggplot2::theme_bw()
  
```
:::
:::

The plot of the heights distribution compared with the standard Gaussian
distribution is missing in Kurz's version. I added this plot after an
internet research by using the last example of [How to Plot a Normal
Distribution in
R](https://www.statology.org/plot-normal-distribution-r/). It uses the
`ggplot2::stat_function()` to compute and draw a function as a
continuous curve. This makes it easy to superimpose a function on top of
an existing plot.
:::
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Looking at the raw data is not enough for a model decision
:::

::: my-watch-out-container
> "Gawking at the raw data, to try to decide how to model them, is
> usually not a good idea. The data could be a mixture of different
> Gaussian distributions, for example, and in that case you won't be
> able to detect the underlying normality just by eyeballing the outcome
> distribution." ([McElreath, 2020, p.
> 81](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=100&annotation=DMU8TC6L))
:::
:::

::: my-theorem
::: my-theorem-header
::: {#thm-def-heights}
: Define the heights as normally distributed with a mean $\mu$ and
standard deviation $\sigma$
:::
:::

::: my-theorem-container
$$
h_{i} \sim \operatorname{Normal}(σ, μ) 
$$ {#eq-height-normal-dist}

-   **Symbol h**: refers to the list of heights
-   **Subscript i**: refers to each individual element of the list. It
    is conventional to use $i$ because it stands for index. The index
    $i$ takes on row numbers, and so in this example can take any value
    from 1 to 352 (the number of heights in `d2_a$height`). As such, the
    model above is saying that all the golem knows about each height
    measurement is defined by the same normal distribution, with mean
    $\mu$ and standard deviation $\sigma$.

@eq-height-normal-dist assumes that the values $h_{i}$ are
`r glossary("i.i.d.")` (independent and identically distributed)
:::
:::

> "The i.i.d. assumption doesn't have to seem awkward, as long as you
> remember that probability is inside the golem, not outside in the
> world. The i.i.d. assumption is about how the golem represents its
> uncertainty. It is an *epistemological* assumption. It is not a
> physical assumption about the world, an *ontological* one. E. T.
> Jaynes (1922--1998) called this the *mind projection fallacy*, the
> mistake of confusing epistemological claims with ontological claims."
> ([McElreath, 2020, p.
> 81](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=100&annotation=VFIF5ITN))

> "To complete the model, we're going to need some priors. The
> parameters to be estimated are both $\mu$ and $\sigma$, so we need a
> prior $Pr(\mu, \sigma)$, the joint prior probability for all
> parameters. In most cases, priors are specified independently for each
> parameter, which amounts to assuming
> $Pr(\mu, \sigma) = Pr(\mu)Pr(\sigma)$." ([McElreath, 2020, p.
> 82](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=101&annotation=5HVI6LB4))

::: my-theorem
::: my-theorem-header
::: {#thm-linear-heights-model}
: Define the linear heights model
:::
:::

::: my-theorem-container
$$
\begin{align*}
h_{i} \sim \operatorname{Normal}(\mu, \sigma) \space \space (1) \\ 
\mu \sim \operatorname{Normal}(178, 20)  \space \space (2) \\ 
\sigma \sim \operatorname{Uniform}(0, 50)   \space \space (3)      
\end{align*}
$$ {#eq-height-linear-model-m4-1}

------------------------------------------------------------------------

1.  First line represents the likelihood.
2.  Second line is the chosen $\mu$ (mu, mean) prior. It is a broad
    Gaussian prior, centered on 178 cm, with 95% of probability between
    178 ± 40 cm.
3.  Third line is the chosen $\sigma$ (sigma, standard deviation) prior.
:::
:::

Let's think about the chosen value for the priors more in detail:

**1. Choosing the mean prior**

-   **Why normal distribution?**: As we have stated before in
    @thm-def-heights the heights distribution of adults is a Gaussian
    distribution.
-   **Why 178cm?**: "Your author is 178 cm tall. And the range from 138
    cm to 218 cm encompasses a huge range of plausible mean heights for
    human populations. So domain-specific information has gone into this
    prior." ([McElreath, 2020, p.
    82](zotero://select/groups/5243560/items/NFUEVASQ))
    ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=101&annotation=DUN4YP7H))
-   **Why 95% and 40cm?**: 40cm (T= twice sigma = 2\* 20) and 95% is a
    reference to the [68--95--99.7
    rule](https://en.wikipedia.org/w/index.php?title=68%E2%80%9395%E2%80%9399.7_rule&oldid=1187581793)
    that helps to remember how many percentages of values lie within an
    interval estimate in a normal distribution:

$$
\begin{align}
  \Pr(\mu-1\sigma \le X \le \mu+1\sigma) & \approx 68.27\% \\
  \Pr(\mu-2\sigma \le X \le \mu+2\sigma) & \approx 95.45\% \\
  \Pr(\mu-3\sigma \le X \le \mu+3\sigma) & \approx 99.73\%
\end{align}
$$

**2. Choosing the sigma prior**

-   **Why uniform distribution?**: We assume `r glossary("i.i.d.")`,
    e.g., the standard deviation is over the whole distribution
    identical.
-   **Why 0 as lower limit?**: "A standard deviation like σ must be
    positive, so bounding it at zero makes sense." ([McElreath, 2020, p.
    82](zotero://select/groups/5243560/items/NFUEVASQ))
    ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=101&annotation=9KXSZF9A))
-   **Why 50 as upper limit?** "... a standard deviation of 50 cm would
    imply that 95% of individual heights lie within 100 cm of the
    average height." ([McElreath, 2020, p.
    82](zotero://select/groups/5243560/items/NFUEVASQ))
    ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=101&annotation=5653WGIL))
    Thas is a range large enough to include variation of heights.

::: my-important
::: my-important-header
Plot the chosen priors!
:::

::: my-important-container
It is important to plot the priors to get an idea about the assumptions
they build into your model.
:::
:::

::: my-example
::: my-example-header
::: {#exm-ID-text}
: Numbered Example Title
:::
:::

::: my-example-container
::: panel-tabset
###### $\mu$ (Original)

::: my-r-code
::: my-r-code-header
::: {#cnj-code-name-a}
a: Plot the chosen mean prior (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-mean-prior-a
#| fig-cap: "Plot of the chosen mean prior (Original)"

## R code 4.12 ###############################
graphics::curve(stats::dnorm(x, 178, 20), from = 100, to = 250)
```
:::
:::

You can see that the golem is assuming that the average height (not each
individual height) is almost certainly between 140 cm and 220 cm. So
this prior carries a little information, but not a lot.

###### $\sigma$ (Original)

::: my-r-code
::: my-r-code-header
::: {#cnj-code-name-b}
a: Plot chosen prior for the standard deviation (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-sd-prior-a
#| fig-cap: "Plot the chosen prior for the standard deviation (Original)"

## R code 4.13 ###########################
graphics::curve(stats::dunif(x, 0, 50), from = -10, to = 60)
```
:::
:::

###### $\mu$ (Tidyverse)

::: my-r-code
::: my-r-code-header
<div>

b: Plot the chosen mean prior (Tidyverse)

</div>
:::

::: my-r-code-container
```{r, fig.height=4, fig.width=6}
#| label: fig-mean-prior-b
#| fig-cap: "Plot of the chosen mean prior (Tidyverse)"
#| fig-height: 4
#| fig-width: 6

tibble::tibble(x = base::seq(from = 100, to = 250, by = .1)) |> 
    
  ggplot2::ggplot(ggplot2::aes(x = x, y = stats::dnorm(x, mean = 178, sd = 20))) +
  ggplot2::geom_line() +
  ggplot2::scale_x_continuous(breaks = base::seq(from = 100, to = 250, by = 25)) +
  ggplot2::labs(title = "mu ~ dnorm(178, 20)",
       y = "density") +
  ggplot2::theme_bw()
```
:::
:::

As there is only one variable $y$ (= `dnorm(x, mean = 178, sd = 20)`) we
need to specify $x$ as a sequence of 1501 points to provide a $x$ and
$y$ aesthetic for the plot.

###### $\sigma$ (Tidyverse)

::: my-r-code
::: my-r-code-header
<div>

b: Plot the chosen prior for the standard deviation (Tidyverse)

</div>
:::

::: my-r-code-container
```{r fig.height=4, fig.width=6}
#| label: fig-sd-prior-b
#| fig-cap: "Plot the chosen prior for the standard deviation (Tidyverse)"
#| fig-height: 4
#| fig-width: 6

tibble::tibble(x = base::seq(from = -10, to = 60, by = .1)) |>

  ggplot2::ggplot(ggplot2::aes(x = x, y = stats::dunif(x, min = 0, max = 50))) +
  ggplot2::geom_line() +
  ggplot2::scale_x_continuous(breaks = c(0, 50)) +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::ggtitle("sigma ~ dunif(0, 50)") +
  ggplot2::theme_bw()
```
:::
:::

We don't really need the $y$-axis when looking at the shapes of a
density, so we'll just remove it with
`scale_y_continuous(NULL, breaks = NULL)`.
:::
:::
:::

#### Prior predictive simulation {#sec-prior-predictive-sim}

::: my-important
::: my-important-header
Simulate the prior predictive distribution!
:::

::: my-important-container
> "`r glossary("Prior predictive simulation")` is very useful for
> assigning sensible priors, because it can be quite hard to anticipate
> how priors influence the observable variables." ([McElreath, 2020, p.
> 83](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=102&annotation=GLSLZXTF))
:::
:::

To see the difference we will look at two prior predictive
distributions:

1.  The first one with our reflections and data from @eq-height-linear-model-m4-1.
2.  For the second predictive distribution we will choose a much flatter
    and less informative prior for $\mu$, like
    $\mu \sim Normal(178, 100)$. Priors with such large standard
    deviations are quite common in Bayesian models, but they are hardly
    ever sensible.

> "Okay, so how to do this? You can quickly simulate heights by sampling
> from the prior, like you sampled from the posterior back in
> @sec-chap03. Remember, every posterior is also potentially a prior for
> a subsequent analysis, so you can process priors just like
> posteriors." ([McElreath, 2020, p.
> 82](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=101&annotation=TTZZ63ZW))

::: my-example
::: my-example-header
::: {#exm-prior-predictive-sim}
: Prior Predictive Simulation
:::
:::

::: my-example-container
::: panel-tabset
###### Original 1

::: my-r-code
::: my-r-code-header
::: {#cnj-prior-predictive-sim1-a}
a: Simulate heights by sampling from the priors with
$\mu \sim Normal(178, 20)$ (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-prior-predictive-sim1-a
#| fig-cap: "Simulate heights by sampling from the priors with µ ~ Normal(178,20) (Original)"

N_sim_height_a <- 1e4
set.seed(4) # to make example reproducible

## R code 4.14a adapted #######################################
sample_mu_a <- rnorm(N_sim_height_a, 178, 20)
sample_sigma_a <- runif(N_sim_height_a, 0, 50)
priors_height_a <- rnorm(N_sim_height_a, sample_mu_a, sample_sigma_a)
rethinking::dens(priors_height_a, 
                 adj = 1, 
                 norm.comp = TRUE,
                 show.zero = TRUE,
                 col = "red")
graphics::abline(v = 272, lty = 2)
```
:::
:::

The prior predictive simulation generates a plausible distribution;
There are no values negative (left dashed vertical line at 0) and one of
the tallest people in recorded history, [Robert Pershing
Wadlow](https://en.wikipedia.org/wiki/Robert_Wadlow) (1918--1940) with
272 cm (right dashed vertical line) has only a small probability.

The prior probability distribution of height is not itself Gaussian
because it is approaching the mean too thin and to high, respectively
its tails are too thick. But this is ok.

> "The distribution you see is not an empirical expectation, but rather
> the distribution of relative plausibilities of different heights,
> before seeing the data." ([McElreath, 2020, p.
> 83](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=102&annotation=FJYK52J2))

###### Original 2

::: my-r-code
::: my-r-code-header
::: {#cnj-prior-predictive-sim2-a}
a: Simulate heights by sampling from the priors with
$\mu \sim Normal(178, 100)$ (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-prior-predictive-sim2-a
#| fig-cap: "Simulate heights by sampling from the priors with µ ~ Normal(178,100) (Original)"

N_sim_height_a <- 1e4
set.seed(4) # to make example reproducible

## R code 4.14a adapted #######################################
sample_mu2_a <- rnorm(N_sim_height_a, 178, 100)
sample_sigma2_a <- runif(N_sim_height_a, 0, 50)
priors_height2_a <- rnorm(N_sim_height_a, sample_mu2_a, sample_sigma2_a)
rethinking::dens(priors_height2_a, 
                 adj = 1, 
                 norm.comp = TRUE,
                 show.zero = TRUE,
                 col = "red")
graphics::abline(v = 272, lty = 2)
```
:::
:::

The results of @fig-prior-predictive-sim2-a contradicts our scientific
knowledge --- but also our common sense --- about possible height values
of humans. Now the model, before seeing the data, expects people to have
negative height. It also expects some giants. One of the tallest people
in recorded history, [Robert Pershing
Wadlow](https://en.wikipedia.org/wiki/Robert_Wadlow) (1918--1940) stood
272 cm tall. In our prior predictive simulation many people are taller
than this.

> "Does this matter? In this case, we have so much data that the silly
> prior is harmless. But that won't always be the case. There are plenty
> of inference problems for which the data alone are not sufficient, no
> matter how numerous. Bayes lets us proceed in these cases. But only if
> we use our scientific knowledge to construct sensible priors. Using
> scientific knowledge to build priors is not cheating. The important
> thing is that your prior not be based on the values in the data, but
> only on what you know about the data before you see it." ([McElreath,
> 2020, p. 84](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=103&annotation=RDI64UXM))

###### Tidyverse 1

::: my-r-code
::: my-r-code-header
::: {#cnj-prior-predictive-sim1-b}
b: Simulate heights by sampling from the priors with
$\mu \sim Normal(178, 20)$ (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-prior-predictive-sim1-b
#| fig-cap: "Simulate heights by sampling from the priors with µ ~ Normal(178,20) (Tidyverse)"

N_sim_height_b <- 1e4
set.seed(4) # to make example reproducible

## R code 4.14b #######################################

sim_height_b <-
  tibble::tibble(sample_mu_b = stats::rnorm(N_sim_height_b, mean = 178, sd  = 20),
                 sample_sigma_b = stats::runif(N_sim_height_b, min = 0, max = 50)) |> 
  dplyr::mutate(priors_height_b = stats::rnorm(N_sim_height_b, 
                                               mean = sample_mu_b, 
                                               sd = sample_sigma_b))
  
sim_height_b |> 
  ggplot2::ggplot(ggplot2::aes(x = priors_height_b)) +
  ggplot2::geom_density(color = "red") +
  ggplot2::stat_function(
        fun = dnorm,
        args = with(sim_height_b, c(mean = mean(priors_height_b), sd = sd(priors_height_b)))
        ) +
  ggplot2::geom_vline(xintercept = c(0, 272), linetype = "dashed") +
  ggplot2::ggtitle("height ~ dnorm(178, 20)") +
  ggplot2::labs(x = "Height in cm", y = "Density") +
  ggplot2::theme_bw()

```

`ggplot2::geom_density()` computes and draws kernel density estimates,
which is a smoothed version of the histogram. Note that there is no data
mentioned explicitly in the call of `ggplot2::geom_density()`. When this
is the case (data = `NULL`) then the data will be inherited from the
plot data as specified in the call to `ggplot2::ggplot()`. Otherwise the
function needs a data frame or a function with a single argument to
override the plot data. ([geom_density() help
file](https://ggplot2.tidyverse.org/reference/geom_density.html)).
:::
:::

The prior predictive simulation generates a plausible distribution;
There are no values negative (left dashed vertical line at 0) and one of
the tallest people in recorded history, [Robert Pershing
Wadlow](https://en.wikipedia.org/wiki/Robert_Wadlow) (1918--1940) with
272 cm (right dashed vertical line) has only a small probability.

The prior probability distribution of height is not itself Gaussian
because it is approaching the mean too thin and to high, respectively
its tails are too thick. But this is ok.

> "The distribution you see is not an empirical expectation, but rather
> the distribution of relative plausibilities of different heights,
> before seeing the data." ([McElreath, 2020, p.
> 83](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=102&annotation=FJYK52J2))

###### Tidyverse 2

::: my-r-code
::: my-r-code-header
::: {#cnj-prior-predictive-sim2-b}
b: Simulate heights by sampling from the priors with
$\mu \sim Normal(178, 100)$ (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-prior-predictive-sim2-b
#| fig-cap: "Simulate heights by sampling from the priors with µ ~ Normal(178,100)  (Tidyverse)"

N_sim_height_b <- 1e4
set.seed(4) # to make example reproducible

## R code 4.14b #######################################

sim_height2_b <-
  tibble::tibble(sample_mu2_b = stats::rnorm(N_sim_height_b, mean = 178, sd  = 100),
                 sample_sigma2_b = stats::runif(N_sim_height_b, min = 0, max = 50)) |> 
  dplyr::mutate(priors_height2_b = stats::rnorm(N_sim_height_b, 
                                               mean = sample_mu2_b, 
                                               sd = sample_sigma2_b))
  
sim_height2_b |> 
  ggplot2::ggplot(ggplot2::aes(x = priors_height2_b)) +
  ggplot2::geom_density(color = "red") +
  ggplot2::stat_function(
        fun = dnorm,
        args = with(sim_height2_b, c(mean = mean(priors_height2_b), sd = sd(priors_height2_b)))
        ) +
  ggplot2::geom_vline(xintercept = c(0, 272), linetype = "dashed") +
  ggplot2::ggtitle("height ~ dnorm(178, 100)") +
  ggplot2::labs(x = "Height in cm", y = "Density") +
  ggplot2::theme_bw()

```
:::
:::

The results of @fig-prior-predictive-sim2-b contradicts our scientific
knowledge --- but also our common sense --- about possible height values
of humans. Now the model, before seeing the data, expects people to have
negative height. It also expects some giants. One of the tallest people
in recorded history, [Robert Pershing
Wadlow](https://en.wikipedia.org/wiki/Robert_Wadlow) (1918--1940) stood
272 cm tall. In our prior predictive simulation many people are taller
than this.

> "Does this matter? In this case, we have so much data that the silly
> prior is harmless. But that won't always be the case. There are plenty
> of inference problems for which the data alone are not sufficient, no
> matter how numerous. Bayes lets us proceed in these cases. But only if
> we use our scientific knowledge to construct sensible priors. Using
> scientific knowledge to build priors is not cheating. The important
> thing is that your prior not be based on the values in the data, but
> only on what you know about the data before you see it." ([McElreath,
> 2020, p. 84](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=103&annotation=RDI64UXM))
:::
:::
:::

### Grid approximation of the posterior distribution

We are going to map out the posterior distribution through brute force
calculations.

This is not recommended because it is

-   laborious and computationally expensive
-   usually so impractical as to be essentially impossible.

Therefore the grid approximation technique has limited relevance. Later
on we will use the quadratic approximation with `rethinking::quap()`.

The strategy is the same grid approximation strategy as before in
@sec-chap02-grid-approx. But now there are two dimensions, and so there
is a geometric (literally) increase in bother.

::: my-procedure
::: my-procedure-header
::: {#prp-grid-approx-r-code}
: Grid approximation as R code (compare with @prp-grid-approx)
:::
:::

::: my-procedure-container
1.  **Define the grid - the first two code lines**: Establish the range
    of $\mu$ and $\sigma$ values that you want to calculate over and
    decide how many points to calculate in-between.
2.  **Compute the values of all combinations of the priors on the grid -
    third line**: Expands the $\mu$ and $\sigma$ values into a matrix of
    all combinations of $\mu$ and $\sigma$. The matrix is stored in the
    data frame `post_a`.
3.  **Compute the log-likelihood at each parameter value - fourth
    line**: Because of rounding error to zero, we need to do all
    calculations at the log scale. `base::sapply()` passes the unique
    combination of $\mu$ and $\sigma$ on each row of `post_a` to a
    function that computes the log-likelihood of each observed height,
    and adds all of these log-likelihoods together with `base::sum()`.
4.  **Compute the unstandardized posterior at each parameter value -
    fifth line**: Multiply prior by likelihood by adding them, because
    we are using the log scale.
5.  **Return to probability scale - sixth line**: To prevent rounding to
    zero we can't use the standard approach with
    `base::exp(post_a$prod)` but have to scale all log-products by the
    maximum log-product.
:::
:::

::: my-important
::: my-important-header
Use log-probability to prevent rounding to zero!
:::

::: my-important-container
> "Remember, in large samples, all unique samples are unlikely. This is
> why you have to work with log-probability." ([McElreath, 2020, p.
> 562](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=581&annotation=7LPX8HMD))
:::
:::

::: my-example
::: my-example-header
<div>

: Grid approximation of the posterior distribution

</div>
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-grid-approx-post-dist-a}
a: Grid approximation of the posterior distribution (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: grid-approx-post-dist-a
#| fig-cap: "Contour plot (left) and heat map (right)"
#| fig-show: hold

## R code 4.16a Grid approx. ##################################

## 1. Define the grid ##########
mu.list_a <- seq(from = 150, to = 160, length.out = 100)  
sigma.list_a <- seq(from = 7, to = 9, length.out = 100)   

## 2. All Combinations of μ & σ ##########
post_a <- expand.grid(mu_a = mu.list_a, sigma_a = sigma.list_a) 

## 3. Compute log-likelihood #######
post_a$LL <- sapply(1:nrow(post_a), function(i) {                 
  sum(                                                            
    dnorm(d2_a$height, post_a$mu[i], post_a$sigma[i], log = TRUE) 
  )                                                               
})                                                                

## 4. Multiply prior by likelihood ##########
## as the priors are on the log scale adding = multiplying
post_a$prod <- post_a$LL + dnorm(post_a$mu_a, 178, 20, TRUE) +    
  dunif(post_a$sigma_a, 0, 50, TRUE)                              

## 5. Back to probability scale #########
## without rounding error 
post_a$prob <- exp(post_a$prod - max(post_a$prod))       

## define plotting area as one row and two columns
par(mfrow = c(1, 2))

## R code 4.17a Contour plot ##################################
rethinking::contour_xyz(post_a$mu_a, post_a$sigma_a, post_a$prob)

## R code 4.18a Heat map ##################################
rethinking::image_xyz(post_a$mu_a, post_a$sigma_a, post_a$prob)
```
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
<div>

b: Numbered R Code Title (Tidyverse)

</div>
:::

::: my-r-code-container
```{r}
#| label: grid-approx-post-dist-b
#| fig-cap: "Contour plot (left) and heat map (right)"
#| fig-show: hold
#| fig-height: 5

## R code 4.16b Grid approx ##################################

## 1./2. Define grid & combinations ##########
d_grid_b <-
  tidyr::expand_grid(mu_b = base::seq(from = 150, to = 160, length.out = 100),
                  sigma_b = base::seq(from = 7, to = 9, length.out = 100))

## 3a. Compute log-likelihood #######
grid_function <- function(mu, sigma) {
  stats::dnorm(d2_b$height, mean = mu, sd = sigma, log = TRUE) |> 
    sum()
}

d_grid2_b <-
  d_grid_b |> 
  
  ## 3b. Compute log-likelihood #######
  dplyr::mutate(log_likelihood_b = purrr::map2(mu_b, sigma_b, grid_function)) |>
  tidyr::unnest(log_likelihood_b) |> 
  dplyr::mutate(prior_mu_b  = stats::dnorm(mu_b, mean = 178, sd = 20, log = T),
         prior_sigma_b      = stats::dunif(sigma_b, min = 0, max = 50, log = T)) |> 
  
  ## 4. Multiply prior by likelihood ##########
  ## as the priors are on the log scale adding = multiplying
  dplyr::mutate(product_b = log_likelihood_b + prior_mu_b + prior_sigma_b) |> 
  
  ## 5. Back to probability scale #########
  dplyr::mutate(probability_b = exp(product_b - max(product_b)))


## R code 4.17b Contour plot ##################################
p1_b <- 
  d_grid2_b |> 
    ggplot2::ggplot(ggplot2::aes(x = mu_b, y = sigma_b, z = probability_b)) + 
    ggplot2::geom_contour() +
    ggplot2::labs(x = base::expression(mu),
                  y = base::expression(sigma)) +
    ggplot2::coord_cartesian(xlim = c(153.5, 155.7),
                         ylim = c(7, 8.5)) +
    ggplot2::theme_bw()

## R code 4.18b Heat map ##################################
p2_b <- 
  d_grid2_b |> 
    ggplot2::ggplot(ggplot2::aes(x = mu_b, y = sigma_b, fill = probability_b)) + 
    ggplot2::geom_raster(interpolate = TRUE) +
    ggplot2::scale_fill_viridis_c(option = "B") +
    ggplot2::labs(x = base::expression(mu),
         y = base::expression(sigma)) +
    ggplot2::coord_cartesian(xlim = c(153.5, 155.7),
                         ylim = c(7, 8.5)) +
    ggplot2::theme_bw()

library(patchwork)
p1_b + p2_b

```

-   The produced tibble contains data frames in its cells, so that we
    have to use the `tidyr::unnest()` function to expand the list-column
    containing data frames into rows and columns.
-   With `ggplot2::coord_cartesian()` I zoomed into the graph to
    concentrate on the most important x/y ranges.
-   The axes uses the symbols of $\mu$ and $\sigma$ provided by
    unevaluated expressions through `base::expression()`.
:::
:::
:::
:::
:::

| PURPOSE                        | ORIGINAL                  | TIDYVERSE               |
|---------------------------|-----------------------|----------------------|
| All combinations               | base::expand.grid()       | tidyr::expand_grid()    |
| Apply function to each element | base::sapply()            | purrr::map2()           |
| Contour plot                   | rethinking::contour_xyz() | ggplot2::geom_contour() |
| Heat map                       | rethinking::image_xyz()   | ggplot2::geom_raster()  |

: Function equivalence between Original and Tidyverse
{#tbl-functions-equivalence-rethinking-tidyverse}

::: my-note
::: my-note-header
Creating data frame resp. tibble of all combinations
:::

::: my-note-container
There are several related function for `base::expand.grid()` in
{**tidyverse**}:

-   `tidyr::expand_grid()`: Create a tibble from all combination of
    inputs. This is the most similar function to `base::expand.grid()`
    as its input are vectors rather than a data frame. But it different
    in five aspects:
    1.  Produces sorted output (by varying the first column the slowest,
        rather than the fastest).
    2.  Returns a tibble, not a data frame.
    3.  Never converts strings to factors.
    4.  Does not add any additional attributes.
    5.  Can expand any generalised vector, including data frames.
-   `tidyr::expand()`: Generates all combination of variables found in a
    dataset. It is paired with
    1.  `tidyr::crossing()`: A wrapper around `tidyr::expand_grid()` the
        de-duplicates and sorts the inputs.
    2.  `tidyr::nesting()`: Finds only combinations already present in
        the data.
:::
:::

### Sampling from the posterior {#sec-chap04-sampling-heights-from-posterior}

> "To study this posterior distribution in more detail, again I'll push
> the flexible approach of sampling parameter values from it. This works
> just like it did in @cnj-chap03-sample-posterior-globe-tossing, when
> you sampled values of $p$ from the posterior distribution for the
> globe tossing example." ([McElreath, 2020, p.
> 85](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=104&annotation=L8LYZ2VZ))

::: my-procedure
::: my-procedure-header
::: {#prp-chap04-sampling-heights-from-posterior}
: Sampling from the posterior
:::
:::

::: my-procedure-container
Since there are two parameters, and we want to sample combinations of
them:

1.  Randomly sample row numbers in `post_a` in proportion to the values
    in `post_a$prob`.
2.  Pull out the parameter values on those randomly sampled rows.
:::
:::

::: my-example
::: my-example-header
::: {#exm-chap04-sampling-heights-from-posterior}
: Sampling from the posterior
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-sampling-heights-from-posterior-a}
a: Samples from the posterior distribution for the heights data
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-posterior-sample-heights-a
#| fig-cap: "Samples from the posterior distribution for the heights data. (Original)"

## R code 4.19a ###########################

# 1. Sample row numbers #########
# randomly sample row numbers in post_a 
# in proportion to the values in post_a$prob. 
sample.rows <- sample(1:nrow(post_a),
  size = 1e4, replace = TRUE,
  prob = post_a$prob
)

# 2. pull out parameter values ########
sample.mu_a <- post_a$mu[sample.rows]
sample.sigma_a <- post_a$sigma[sample.rows]

## R code 4.20a ###########################
plot(sample.mu_a, sample.sigma_a, 
     cex = 0.8, pch = 21, 
     col = rethinking::col.alpha(rethinking:::rangi2, 0.1)
     )
```

-   `rethinking::col.alpha()` is part of the {**rethinking**} R package.
    It makes colors transparent for a better inspections of values where
    data overlap.
-   `rethinking:::rangi2` itself is just the [definition of a hex color
    code](https://github.com/rmcelreath/rethinking/blob/2f01a9c5dac4bc6e9a6f95eec7cae268200a8181/R/colors.r#L22)
    ("#8080FF") specifying the shade of blue.

Adjust the plot to your tastes by playing around with `cex` (character
expansion, the size of the points), `pch` (plot character), and the
$0.1$ transparency value.
:::
:::

The density of points is highest in the center, reflecting the most
plausible combinations of $\mu$ and $\sigma$. There are many more ways
for these parameter values to produce the data, conditional on the
model.

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-sampling-heights-from-posterior-b}
b: Samples from the posterior distribution for the heights data
(Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-posterior-sample-b
#| fig-cap: "Samples from the posterior distribution for the heights data. (Tidyverse)"

set.seed(4)

d_grid_samples_b <- 
  d_grid2_b |> 
  dplyr::slice_sample(n = 1e4, 
               replace = TRUE, 
               weight_by = probability_b
               )

d_grid_samples_b |> 
  ggplot2::ggplot(ggplot2::aes(x = mu_b, y = sigma_b)) + 
  ggplot2::geom_point(size = 1.8, alpha = 1/15, color = "#8080FF") +
  ggplot2::scale_fill_viridis_c() +
  ggplot2::labs(x = expression(mu[samples]),
       y = expression(sigma[samples])) +
  ggplot2::theme_bw()


```

Kurz used the superseded `dplyr::sample_n()` to sample rows with
replacement from `d_grid2_b`. I used instead the newer
`dplyr::slice_sample()` that should used to sample rows.
:::
:::

The density of points is highest in the center, reflecting the most
plausible combinations of μ and σ. There are many more ways for these
parameter values to produce the data, conditional on the model.
:::
:::
:::

#### Marginal posterior densities of μ and σ

The jargon `r glossary("Marginal Distribution", "marginal")` here means
"averaging over the other parameters."

We described the distribution of confidence in each combination of $\mu$
and $\sigma$ by summarizing the samples. Think of them like data and
describe them, just like in @sec-chap03-sampling-to-summarize. For
example, to characterize the shapes of the marginal posterior densities
of $\mu$ and $\sigma$, all we need to do is to call `rethinking::dens()`
with the appropriate vector `sample.mu_a` resp. `sample.sigma_a`.

::: my-example
::: my-example-header
::: {#exm-chap04-heights-posterior-densities}
: Marginal posterior densities of $\mu$ and $\sigma$ for the heights
data
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-heights-posterior-densities-a}
a: Marginal posterior densities of $\mu$ and $\sigma$ for the heights
data (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-heights-posterior-densities-a
#| fig-cap: "Marginal posterior densities of μ and σ for the heights data (Original)"


## define plotting area as one row and two columns
par(mfrow = c(1, 2))

## R code 4.21a adapted #########################
rethinking::dens(sample.mu_a, adj = 1, show.HPDI = 0.89, 
                 norm.comp = TRUE, col = "red")
rethinking::dens(sample.sigma_a, adj = 1, show.HPDI = 0.89,
                 norm.comp = TRUE, col = "red")
```

For a comparison I have overlaid the normal distribution and shown the
.89% HPDI. Compare the grayed area with the calculation of the values in
@exm-chap04-summarize-pi.
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-heights-posterior-densities-b}
b: Marginal posterior densities of $\mu$ and $\sigma$ for the heights
data (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-heights-posterior-densities-b
#| fig-cap: "Marginal posterior densities of μ and σ for the heights data (Tidyverse)"

d_grid_samples_b |> 
  tidyr::pivot_longer(mu_b:sigma_b) |> 

  ggplot2::ggplot(ggplot2::aes(x = value)) + 
  ggplot2::geom_density(color = "red") +
  # ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  # ggplot2::xlab(NULL) +
  ggplot2::stat_function(
      fun = dnorm,
      args = with(d_grid_samples_b, c(
        mean = mean(mu_b),
        sd = sd(mu_b)))
      ) +
  ggplot2::stat_function(
    fun = dnorm,
    args = with(d_grid_samples_b, c(
      mean = mean(sigma_b),
      sd = sd(sigma_b)))
    ) +
  ggplot2::labs(x = "mu (left), sigma (right)",
                y = "Density") +
  ggplot2::theme_bw() +
  ggplot2::facet_wrap(~ name, scales = "free",
                      labeller = ggplot2::label_parsed)
```

Kurz used `tidyr::pivot_longer()` and then `ggplot2::facet_wrap()` to
plot the densities for both `mu` and `sigma` at once. For a comparison I
have overlaid the normal distribution. But I do not know how to prevent
the base line at density = 0. See Tidyverse 2
(@fig-chap04-heights-posterior-densities2-b) where I have constructed
the plots of both distribution separately.
:::
:::

###### Tidyverse 2

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-heights-posterior-densities2-b}
b: Marginal posterior densities of $\mu$ and $\sigma$ for the heights
data (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-heights-posterior-densities2-b
#| fig-cap: "Marginal posterior densities of μ and σ for the heights data (Tidyverse)"

plot_mu_b <-
  d_grid_samples_b |> 
    ggplot2::ggplot(ggplot2::aes(x = mu_b)) + 
    ggplot2::geom_density(color = "red") +
    ggplot2::stat_function(
        fun = dnorm,
        args = with(d_grid_samples_b, c(
          mean = mean(mu_b), 
          sd = sd(mu_b)))
        ) +
    ggplot2::labs(x = expression(mu),
                  y = "Density") +
    ggplot2::theme_bw()

plot_sigma_b <- 
  d_grid_samples_b |> 
    ggplot2::ggplot(ggplot2::aes(x = sigma_b)) + 
    ggplot2::geom_density(color = "red") +
    ggplot2::stat_function(
        fun = dnorm,
        args = with(d_grid_samples_b, c(
          mean = mean(sigma_b), 
          sd = sd(sigma_b)))
        ) +
    ggplot2::labs(x = expression(sigma),
                  y = "Density") +
    ggplot2::theme_bw()

library(patchwork)
plot_mu_b + plot_sigma_b
```
:::
:::
:::

> "These densities are very close to being normal distributions. And
> this is quite typical. As sample size increases, posterior densities
> approach the normal distribution. If you look closely, though, you'll
> notice that the density for $\sigma$ has a longer right-hand tail.
> I'll exaggerate this tendency a bit later, to show you that this
> condition is very common for standard deviation parameters."
> ([McElreath, 2020, p.
> 86](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=105&annotation=U26FFH8V))
:::
:::

#### Posterior compatibility intervals (PIs)

Since the drawn samples of @exm-chap04-sampling-heights-from-posterior
are just vectors of numbers, you can compute any statistic from them
that you could from ordinary data: mean, median, or quantile, for
example.

As examples we will compute `r glossary("PI")` and
`r glossary("HPDI")`/`r glossary("HDCI")`.

We'll use the {**tidybayes**} resp. {**ggdist**} package to compute
their posterior modes of the 89% HDIs (and not the standard 95%
intervals, as recommended by McElreath).

::: my-resource
::: my-resource-header
{**tidybayes**} has a companion package {**ggdist**}
:::

::: my-resource-container
There is a companion package {**ggdist**} which is imported completely
by {**tidybayes**}. Whenever you cannot find the function in
{**tidybayes**} then look at the documentation of {**ggdist**}. This is
also the case for the `tidybayes::mode_hdi()` function. In the help
files of {**tidybayes**} you will just find notes about a deprecated
`tidybayes::mode_hdih()` function but not the arguments of its new
version without the last `h` (for horizontal) `tidybayes::mode_hdi()`.
But you can look up these details in the {**ggdist**} documentation.
This observation is valid for many families of deprecated functions.

There is a division of functionality between {**tidybayes**} and
{**ggdist**}:

-   {**tidybayes**}: Tidy Data and 'Geoms' for Bayesian Models: Compose
    data for and extract, manipulate, and visualize posterior draws from
    Bayesian models in a tidy data format. Functions are provided to
    help extract tidy data frames of draws from Bayesian models and that
    generate point summaries and intervals in a tidy format.
-   {**ggdist**}: Visualizations of Distributions and Uncertainty:
    Provides primitives for visualizing distributions using
    {**ggplot2**} that are particularly tuned for visualizing
    uncertainty in either a frequentist or Bayesian mode. Both
    analytical distributions (such as frequentist confidence
    distributions or Bayesian priors) and distributions represented as
    samples (such as bootstrap distributions or Bayesian posterior
    samples) are easily visualized.
:::
:::

::: my-example
::: my-example-header
::: {#exm-chap04-summarize-pi}
: Summarize the widths with posterior compatibility intervals
:::
:::

::: my-example-container
::: panel-tabset
###### PI (Original)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-summarize-pi-a}
a: Posterior compatibility interval (PI) (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-summarize-pi-a
#| results: hold

## R code 4.22a ####################
rethinking::PI(sample.mu_a)
rethinking::PI(sample.sigma_a)
```

The first two lines refers to the heights $\mu$ samples, the other lines
to the heights $\sigma$ samples.
:::
:::

###### HPDI (Original)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-summarize-hpdi-a}
a: Highest Posterior Density Interval (HPDI) (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-summarize-hpdi-a
#| results: hold

## R code 4.22a ####################
rethinking::HPDI(sample.mu_a)
rethinking::HPDI(sample.sigma_a)
```

The first two lines refers to the heights $\mu$ samples, the other lines
to the heights $\sigma$ samples.
:::
:::

###### PI (Tidyverse)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-summarize-pi-b}
b: Posterior compatibility interval (PI) (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-summarize-pi-b

d_grid_samples_b |> 
  tidyr::pivot_longer(mu_b:sigma_b) |> 
  dplyr::group_by(name) |> 
  ggdist::mode_hdi(value, .width = 0.89) 
```
:::
:::

###### HCDI (Tidyverse)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-summarize-hpdi-b}
b: Highest Density Continuous Interval (HDCI) (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-summarize-hdci-b

d_grid_samples_b |> 
  tidyr::pivot_longer(mu_b:sigma_b) |> 
  dplyr::group_by(name) |> 
  ggdist::mode_hdci(value, .width = 0.89) 
```

In {**tidybayes**} resp. {**ggdist**} the shortest probability interval
(= Highest Posterior Density Interval (HPDI)) is called Highest Density
Continuous Interval (HDCI).
:::
:::
:::
:::
:::

::: my-note
::: my-note-header
{**rethinking**} versus {**tidybayes/ggdist**}
:::

::: my-note-container
There are small differences in the results of both packages
(**rethinking**) and {**ggdist/tidybayes**} that are not important.
:::
:::

> "Before moving on to using quadratic approximation (quap) as shortcut
> to all of this inference, it is worth repeating the analysis of the
> height data above, but now with only a fraction of the original data.
> The reason to do this is to demonstrate that, in principle, the
> posterior is not always so Gaussian in shape. There's no trouble with
> the mean, μ. For a Gaussian likelihood and a Gaussian prior on μ, the
> posterior distribution is always Gaussian as well, regardless of
> sample size. It is the standard deviation σ that causes problems. So
> if you care about σ---often people do not---you do need to be careful
> of abusing the quadratic approximation." ([McElreath, 2020, p.
> 86](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=105&annotation=HP4LEB6D))

::: my-example
::: my-example-header
::: {#exm-chap04-sample-size-sigma}
: Sample size and the normality of $\sigma$'s posterior
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-sample-size-sigma-a}
a: Sample size and the normality of $\sigma$'s posterior (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-sample-size-sigma-a
#| fig-cap: "Sample 20 heights to see that sigma is not Gaussian anymore (Original)"

## R code 4.23a ######################################
d3_a <- sample(d2_a$height, size = 20)

## R code 4.24a ######################################
mu2_a.list <- seq(from = 150, to = 170, length.out = 200)
sigma2_a.list <- seq(from = 4, to = 20, length.out = 200)
post2_a <- expand.grid(mu = mu2_a.list, sigma = sigma2_a.list)
post2_a$LL <- sapply(1:nrow(post2_a), function(i) {
  sum(dnorm(d3_a,
    mean = post2_a$mu[i], sd = post2_a$sigma[i],
    log = TRUE
  ))
})
post2_a$prod <- post2_a$LL + dnorm(post2_a$mu, 178, 20, TRUE) +
  dunif(post2_a$sigma, 0, 50, TRUE)
post2_a$prob <- exp(post2_a$prod - max(post2_a$prod))
sample2_a.rows <- sample(1:nrow(post2_a),
  size = 1e4, replace = TRUE,
  prob = post2_a$prob
)
sample2_a.mu <- post2_a$mu[sample2_a.rows]
sample2_a.sigma <- post2_a$sigma[sample2_a.rows]

## define plotting area as one row and two columns
par(mfrow = c(1, 2))
plot(sample2_a.mu, sample2_a.sigma,
  cex = 0.5,
  col = rethinking::col.alpha(rethinking:::rangi2, 0.1),
  xlab = "mu", ylab = "sigma", pch = 16
)

## R code 4.25a ############
rethinking::dens(sample2_a.sigma, 
                 norm.comp = TRUE,
                 col = "red")

```
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-sample-size-sigma-b}
b: Sample size and the normality of $\sigma$'s posterior (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-sample-size-sigma-b

set.seed(4)
d3_b <- sample(d2_b$height, size = 20)

n <- 200

# note we've redefined the ranges of `mu` and `sigma`
d3_grid_b <-
  tidyr::crossing(mu3_b    = seq(from = 150, to = 170, length.out = n),
           sigma3_b = seq(from = 4, to = 20, length.out = n))

grid_function3_b <- function(mu, sigma) {
  dnorm(d3_b, mean = mu, sd = sigma, log = T) |> 
    sum()
}

d3_grid_b <-
  d3_grid_b |>  
  dplyr::mutate(log_likelihood3_b = 
         purrr::map2_dbl(mu3_b, sigma3_b, grid_function3_b))  |>  
  dplyr::mutate(prior3_mu_b    = stats::dnorm(mu3_b, mean = 178, sd = 20, log = T),
         prior3_sigma_b = stats::dunif(sigma3_b, min = 0, max = 50, log = T)) |> 
  dplyr::mutate(product3_b = log_likelihood3_b + prior3_mu_b + prior3_sigma_b) |> 
  dplyr::mutate(probability3_b = base::exp(product3_b - base::max(product3_b)))

set.seed(4)

d3_grid_samples_b <- 
  d3_grid_b |> 
  dplyr::slice_sample(n = 1e4, 
           replace = T, 
           weight_by = probability3_b)

plot3_d3_scatterplot <- 
  d3_grid_samples_b |> 
    ggplot2::ggplot(ggplot2::aes(x = mu3_b, y = sigma3_b)) + 
    ggplot2::geom_point(size = 1.8, alpha = 1/15, color = "#8080FF") +
    ggplot2::labs(x = base::expression(mu[samples]),
         y = base::expression(sigma[samples])) +
    ggplot2::theme_bw()

plot3_d3_sigma3_b <- 
  d3_grid_samples_b |> 
    ggplot2::ggplot(ggplot2::aes(x = sigma3_b)) + 
    ggplot2::geom_density(color = "red") +
    ggplot2::stat_function(
        fun = dnorm,
        args = with(d3_grid_samples_b, c(
          mean = mean(sigma3_b), 
          sd = sd(sigma3_b)))
        ) +
    ggplot2::labs(x = expression(sigma),
                  y = "Density") +
    ggplot2::theme_bw()

library(patchwork)
plot3_d3_scatterplot + plot3_d3_sigma3_b

```
:::
:::
:::

Compare the left panel with @fig-posterior-sample-b and the right panel
with the right panel of @fig-chap04-heights-posterior-densities2-b to
see that now $\sigma$ has a long right tail and does not follow a
Gaussian distribution.

> "The deep reasons for the posterior of σ tending to have a long
> right-hand tail are complex. But a useful way to conceive of the
> problem is that variances must be positive. As a result, there must be
> more uncertainty about how big the variance (or standard deviation) is
> than about how small it is." ([McElreath, 2020, p.
> 86](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=105&annotation=RRXCJMTH))

> "For example, if the variance is estimated to be near zero, then you
> know for sure that it can't be much smaller. But it could be a lot
> bigger." ([McElreath, 2020, p.
> 87](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=106&annotation=7XZ87ZJT))
:::
:::

### Finding the posterior distribution with quadratic approximation

> "To build the quadratic approximation, we'll use `quap()`, a command
> in the {**rethinking**} package. The `quap()` function works by using
> the model definition you were introduced to earlier in this chapter.
> Each line in the definition has a corresponding definition in the form
> of R code. The engine inside quap then uses these definitions to
> define the posterior probability at each combination of parameter
> values. Then it can climb the posterior distribution and find the
> peak, its `r glossary("MAP")` (**Maximum A Posteriori** estimate).
> Finally, it estimates the quadratic curvature at the MAP to produce an
> approximation of the posterior distribution. Remember: This procedure
> is very similar to what many non-Bayesian procedures do, just without
> any priors." ([McElreath, 2020, p.
> 87](zotero://select/groups/5243560/items/NFUEVASQ), parenthesis and
> emphasis are mine)
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=106&annotation=FFDN2FE2))

::: my-procedure
::: my-procedure-header
::: {#prp-chap04-m4-1}
: Finding the posterior distribution
:::
:::

::: my-procedure-container
1.  We start with the `Howell1` data frame for adults (age \>= 18)
    (**Code 4.26**).
2.  We place the R code equivalents into `base::alist()` We are going to
    use the @eq-height-linear-model-m4-1 (**Code 4.27**).
3.  We can add some additional options like start values (**Code 4.30**)
4.  We fit the model to the data of our data frame and store the fitted
    model (**Code 4.28**).
5.  Now we can have a look at the posterior distribution (**Code
    4.29**).
:::
:::

::: my-example
::: my-example-header
::: {#exm-chap04-m4-1}
: Finding the posterior distribution
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-m4-1a}
a: Finding the posterior distribution with `rethinking::quap()`
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-m4-1a
#| cache: true

## R code 4.26a ######################
data(package = "rethinking", list = "Howell1")
d_a <- Howell1
d2_a <- d_a[d_a$age >= 18, ]

## R code 4.27a ######################
flist <- alist(
  height ~ dnorm(mu, sigma),
  mu ~ dnorm(178, 20),
  sigma ~ dunif(0, 50)
)

## R code 4.30a #####################
start <- list(
  mu = mean(d2_a$height),
  sigma = sd(d2_a$height)
)

## R code 4.28a ######################
m4.1a <- rethinking::quap(flist, data = d2_a, start = start)
m4.1a
```

------------------------------------------------------------------------

The `rethinking::quap()` function returns a "map" object.

Sometimes I got an error message when computing this code chunk. The
reason was that `quap()` has chosen an inconvenient value to start for
its estimation of the posterior. I believe that one could visualize the
problem with a metaphor: Instead of climbing up the hill `quap()`
started with a value where it was captured in a narrow valley.
:::
:::

::: my-note
::: my-note-header
The three parts of `rethinking::quap()`
:::

::: my-note-container
1.  A formula or `base::alist()` of formulas that define the likelihood
    and priors.
2.  A data frame or list containing the data.
3.  Some options like start values of method for search optimization.
    (Not used here in this example). Note that the list of start values
    is a regular `list()`, not an `alist()` like the formula list is.
:::
:::

::: my-resource
::: my-resource-header
How to use formulae in statistical models in R
:::

::: my-resource-container
To learn more about using formulae read "Statistical Models in R"
[online](https://cran.r-project.org/doc/manuals/R-intro.html#Statistical-models-in-R)
or chapter 11 in the
[PDF](https://cran.r-project.org/doc/manuals/R-intro.pdf).
:::
:::

###### precis

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-precis-m4-1a}
a: Printing with `rethinking::precis` (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-precis-m4-1a

## R code 4.29a ######################
(precis_m4.1a <- rethinking::precis(m4.1a))
```

> "These numbers provide Gaussian approximations for each parameter's
> marginal distribution. This means the plausibility of each value of μ,
> after averaging over the plausibilities of each value of σ, is given
> by a Gaussian distribution with mean 154.6 and standard deviation 0.4.
> The 5.5% and 94.5% quantiles are percentile interval boundaries,
> corresponding to an 89% compatibility interval. Why 89%? It's just the
> default. It displays a quite wide interval, so it shows a
> high-probability range of parameter values. If you want another
> interval, such as the conventional and mindless 95%, you can use
> precis(m4.1,prob=0.95). But I don't recommend 95% intervals, because
> readers will have a hard time not viewing them as significance tests.
> 89 is also a prime number, so if someone asks you to justify it, you
> can stare at them meaningfully and incant, "Because it is prime."
> That's no worse justification than the conventional justification for
> 95%." ([McElreath, 2020, p.
> 88](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=107&annotation=5ARWQP2N))
:::
:::

When you compare the 89% boundaries with the result of the grid
approximation in @exm-chap04-summarize-pi you will see that they are
almost identical as the posterior is approximately Gaussian.

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-m4-1b}
b: Finding the posterior distribution with `brms::brm()`
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-m4-1b
#| cache: true

## R code 4.26b ######################
data(package = "rethinking", list = "Howell1")
d_b <- Howell1
d2_b <- 
  d_b |> 
  dplyr::filter(age >= 18) 


## R code 4.27b ######################
## R code 4.28b ######################
m4.1b <- 
  brms::brm(
      formula = height ~ 1,                                           # 1
      data = d2_b,                                                    # 2
      family = gaussian(),                                            # 3
      prior = c(brms::prior(normal(178, 20), class = Intercept),      # 4
                brms::prior(uniform(0, 50), class = sigma, ub = 50)), # 4
      iter = 2000,               # 5
      warmup = 1000,             # 6
      chains = 4,                # 7
      cores = 4,                 # 8 
      seed = 4,                  # 9
      file = "brm_fits/m04.01b") # 10
m4.1b
```

------------------------------------------------------------------------

The `brms::brm()` function returns a "brmsfit" object.

If you don't want to specify the result more in detail (for instance to
change the PI) than you get the same result with
`brms:::print.brmsfit(m4.1b)` and `brms:::summary.brmsfit(m4.1b)`.

::: my-resource
::: my-resource-header
Description of the convergence diagnostics `Rhat`, `Bulk_ESS`, and
`Tail_ESS`
:::

::: my-resource-container
-   The convergence diagnostics `Rhat`, `Bulk_ESS`, and `Tail_ESS` are
    described in detail in [@vehtari2021].
-   The code for the paper is available on Github
    (https://github.com/avehtari/rhat_ess).
-   Examples and even a larger variety of numerical experiments are
    available in the online appendix at
    https://avehtari.github.io/rhat_ess/rhat_ess.html.
:::
:::
:::
:::

`brms::brm()` has more than 40 arguments but only three (formula, data
and prior) are for our case mandatory. All the other have sensible
default values. The correspondence of these three arguments to the
{**rethinking**} version is obvious.

In the simple demonstration of `brms::brm()` in the toy globe example
(@cnj-chap02-brms-globe-tossing), I have just used Kurz' code lines
without any explanation. This time I will explain all 10 arguments from
Kurz' example.

------------------------------------------------------------------------

1.  **formula** describes the relation between dependent and independent
    variables in the form of a linear model. The left hand side are the
    dependent variables, the right hand side the independent. The
    independent variables are used to calculate the trend component of
    the linear model, the residuals are then assumed to have some kind
    of distribution. When the independent are equal to one `~ 1`, the
    trend component is a single value, e.g. the mean value of the data,
    i.e. the linear model only has an intercept.
    ([StackOverflow](https://stackoverflow.com/a/13366973/7322615)) In
    other words, it is the value the dependent variable is expected to
    have when the independent variables are zero or have no influence.
    ([StackOverflow](https://stackoverflow.com/a/13367260/7322615)). The
    formula `y ~ 1` is just a model with a constant (intercept) and no
    regressor
    ([StackOverflow](https://stackoverflow.com/questions/53812741/tilde-operator-in-r)).
    Or more understandable for our case are Kurz' explication: "... the
    intercept of a typical regression model with no predictors is the
    same as its mean. In the special case of a model using the binomial
    likelihood, the mean is the probability of a 1 in a given trial,
    $\theta$."
    ([Kurz](https://bookdown.org/content/4857/small-worlds-and-large-worlds.html))
    .
2.  **data**: A data frame that contains all the variables used in the
    model.
3.  **family**: A description of the response distribution and link
    function to be used in the model. This can be a family function, a
    call to a family function or a character string naming the family.
    By default a linear `gaussian` model is applied. So this line would
    not have been necessary. There are [standard family functions
    `stats::family()`](https://www.stat.ethz.ch/R-manual/R-devel/library/stats/html/family.html)that
    will work with {**brms**}, but there are also [special family
    functions
    `brms::brmsfamily()`](https://paul-buerkner.github.io/brms/reference/brmsfamily.html)
    that work only for {**brms**} models. Additionally you can [specify
    custom
    families](https://paul-buerkner.github.io/brms/reference/custom_family.html)
    for use in brms with the `brms::custom_family()` function.
4.  **prior**: The next two lines specify priors for the normal and the
    uniform distribution. As you can see this is another place where
    parts for the formula are provided for the `brms::brm()` function
    work. --- `class` specifies the parameter class. It defaults to "b"
    ((i.e. population-level -- 'fixed' -- effects)). (There is also the
    argument `group` for grouping of factors for group-level effects.
    Not used in this code example.) --- Besides the "b" class there are
    other classes for the "Intercept" and the standard deviation "Sigma"
    on the population level: There is also a "sd" class for the standard
    deviation of group-level effects. Finally there is the special case
    of `class = "cor"` to set the same prior on every correlation
    matrix. --- `ub = 50` sets the upper bound to 50. There is also a
    `lb` (lower bound). Both bounds are for parameter restriction, so
    that population-level effects must fall within a certain interval
    using the `lb` and `ub` arguments. `lb` and `ub` default to `NULL`,
    i.e. there is no restriction.
5.  **iter**: Number of total iterations per chain (including `warmup`;
    defaults to 2000).
6.  **warmup**: A positive integer specifying number of warmup
    iterations. This also specifies the number of iterations used for
    stepsize adaptation, so warmup draws should not be used for
    inference. The number of warmup should not be larger than `iter` and
    the default is $iter/2$.
7.  **chains**: Number of `r glossary("Markov chain", "Markov chains")`
    (defaults to 4).
8.  **cores**: Number of cores to use when executing the chains in
    parallel, which defaults to 1 but we recommend setting the
    `mc.cores` option to be as many processors as the hardware and RAM
    allow (up to the number of chains).
9.  **seed**: The seed for random number generation to make results
    reproducible. Kurz has always used for `set.seed()` in other code
    chunks the chapter number. If `NA` (the default), Stan will set the
    seed randomly.
10. **file**: Either `NULL` or a character string. In the latter case,
    the fitted model object is saved via `base::saveRDS()` in a file
    named after the string supplied in file. The `.rds` extension is
    added automatically. If the file already exists, `brms::brm()` will
    load and return the saved model object instead of refitting the
    model. Unless you specify the `file_refit` argument as well, the
    existing files won't be overwritten, you have to manually remove the
    file in order to refit and save the model under an existing file
    name. The file name is stored in the brmsfit object for later usage.

-   **init**: Not used here: Within the `brm()` function, you use the
    `init` argument for the start values for the sampler. "If NULL (the
    default) or"random", Stan will randomly generate initial values for
    parameters in a reasonable range. If 0, all parameters are
    initialized to zero on the unconstrained space. This option is
    sometimes useful for certain families, as it happens that default
    random initial values cause draws to be essentially constant.
    Generally, setting init = 0 is worth a try, if chains do not
    initialize or behave well. Alternatively, init can be a list of
    lists containing the initial values ..." (Help file)

###### print

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-print-m4-1b}
b: Print the specified results of the `brmsfit` object
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-print-m4-1b
#| results: hold

## R code 4.29b print ######################
brms:::print.brmsfit(m4.1b, prob = .89)

## brms:::summary.brmsfit(m4.1b, prob = .89) # (same result)
```

------------------------------------------------------------------------

::: my-resource
::: my-resource-header
Description of the convergence diagnostics `Rhat`, `Bulk_ESS`, and
`Tail_ESS`
:::

::: my-resource-container
-   The convergence diagnostics `Rhat`, `Bulk_ESS`, and `Tail_ESS` are
    described in detail in [@vehtari2021].
-   The code for the paper is available on Github
    (https://github.com/avehtari/rhat_ess).
-   Examples and even a larger variety of numerical experiments are
    available in the online appendix at
    https://avehtari.github.io/rhat_ess/rhat_ess.html.
:::
:::

`brms:::summary.brmsfit()` results in the same output
`as brms:::print.brmsfit()`. Both return a `brmssummary` object. But
there are some internal differences:

> There is also a
> ⁠$print()⁠ method that prints the same summary stats but removes the extra formatting used for printing tibbles and returns the fitted model object itself. The ⁠$print()⁠
> method may also be faster than
> ⁠$summary()⁠ because it is designed to only compute the summary statistics for the variables that will actually fit in the printed output whereas ⁠$summary()⁠
> will compute them for all of the specified variables in order to be
> able to return them to the user.

Using `print.brmsfit()` or `summary.brmsfit()` defaults to 95%
intervals. As {**rethinking**} defaults to 89% intervals, I have changed
the `prob` parameter of the print method also to 89%.

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Using three colons to address generic functions of S3 classes
:::

::: my-watch-out-container
As I have learned shortly: `print()` or `summary()` are generic
functions where one can add new printing methods with new classes. In
this case `class(m4.1b)` = `r class(m4.1b)`. This means I do not need to
add `brms::` to secure that I will get the {**brms**} printing or
summary method as I didn't load the {**brms**} package. Quite the
contrary: Adding `brms::` would result into the message: "Error:
'summary' is not an exported object from 'namespace:brms'".

As I really want to specify explicitly the method these generic
functions should use, I need to use the syntax with *three* colons, like
`brms:::print.brmsfit()` or `brms:::summary.brmsfit()` respectively.

::: my-resource
::: my-resource-header
Learning more about S3 classes in R
:::

::: my-resource-container
In this respect I have to learn more about S3 classes. There are many
important web resources about this subject that I have found with the
search string "r what is s3 class". Maybe I should start with the [S3
chapter in Advanced R](https://adv-r.hadley.nz/s3.html).
:::
:::
:::
:::
:::
:::

For the interpretation of this output I am going to use the explication
in the [How to use
brms](https://github.com/paul-buerkner/brms#how-to-use-brms) section of
the {**brms**} GitHup page.

1.  **Top**: On the top of the output, some general information on the
    model is given, such as family, formula, number of iterations and
    chains.
2.  **Upper Middle**: If the data were grouped the next part would
    display group-level effects separately for each grouping factor in
    terms of standard deviations and (in case of more than one
    group-level effect per grouping factor) correlations between
    group-level effects. (This part is absent above as there are no
    grouping factors.)
3.  **Lower Middle: here Middle**: Next follow the display of the
    population-level effects (i.e. regression coefficients). If
    incorporated, autocorrelation effects and family specific parameters
    (e.g., the residual standard deviation 'sigma' in normal models) are
    also given. In general, every parameter is summarized using the mean
    (`Estimate`) and the standard deviation (`Est.Error`) of the
    posterior distribution as well as two-sided 95% credible intervals
    (`l-95% CI` and `u-95% CI`) based on quantiles. The last three
    values (`ESS_bulk`, `ESS_tail`, and `Rhat`) provide information on
    how well the algorithm could estimate the posterior distribution of
    this parameter. If `Rhat` is considerably greater than 1, the
    algorithm has not yet converged and it is necessary to run more
    iterations and / or set stronger priors.
4.  **Bottom**: The last part is some short explanation of the sampling
    procedure. `r glossary("NUTS")` stands for **No U-Turn Sampler** and
    is a Hamiltonian Monte Carlo (`r glossary("HMC")` Method. This means
    that it is not a `r glossary("Markov Chain")` method and thus, this
    algorithm avoids the random walk part, which is often deemed as
    inefficient and slow to converge. Instead of doing the random walk,
    NUTS does jumps of length x. Each jump doubles as the algorithm
    continues to run. This happens until the trajectory reaches a point
    where it wants to return to the starting point.

###### fit

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-fit-m4-1b}
b: Print a Stan like summary
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-fit-m4-1b


## R code 4.29b stan like summary ######################
m4.1b$fit
```

Kurz refers to [RStan: the R interface to
Stan](https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html)
for a detailed description. I didn't study the extensive documentation
but I found out two items:

-   `lp__` is the logarithm of the (unnormalized) posterior density as
    calculated by Stan. This log density can be used in various ways for
    model evaluation and comparison.
-   `Rhat` estimates the degree of convergence of a random Markov Chain
    based on the stability of outcomes between and within chains of the
    same length. Values close to one indicate convergence to the
    underlying distribution. Values greater than 1.1 indicate inadequate
    convergence.
:::
:::

###### plot

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-plot-m4-1b}
b: Plot the visual chain diagnostics of the `brmsfit` object
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-plot-m4-1b
#| fig-cap: "Plot model m4.1b"

## R code 4.29b plot ######################
brms:::plot.brmsfit(m4.1b)
```

> After running a model fit with `r glossary("HMC")`, it's a good idea
> to inspect the chains. As we'll see, McElreath covered visual chain
> diagnostics in @sec-chap09. ... If you want detailed diagnostics for
> the HMC chains, call `launch_shinystan(m4.1b)`.
> ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#finding-the-posterior-distribution-with-quap-brm.))

Just a quick preview: It is important over the whole length of the
samples (not counting the warmups, e.g., in our case: 1000 iteration)
the width is constant and that the graph show all (four) chains at the
top resp. bottom. Unfortunately is looking at the (small) graph not
always conclusive. (See also tab "trace plot" in
@exm-find-post-dist-m4-3.)

::: my-resource
::: my-resource-header
Learn more about {**shinystan**}
:::

::: my-resource-container
I haven't applied `launch_shinystan(m4.1b)` as it takes much time and I
do not (yet) understand the detailed report anyway. To learn to work
with {**shinystan**} see the [ShinyStan
website](https://mc-stan.org/users/interfaces/shinystan) and the
vignettes of the R package vignettes ([Deploying to
shinyapps.io](https://rdrr.io/cran/shinystan/f/vignettes/deploy_shinystan.Rmd),
[Getting
Started](https://rdrr.io/cran/shinystan/f/vignettes/shinystan-package.Rmd))
and documentation.
:::
:::
:::
:::
:::
:::
:::

::: my-resource
::: my-resource-header
Package documentation of Stan and friends
:::

::: my-resource-container
-   [Interface to
    shinystan](https://paul-buerkner.github.io/brms/reference/launch_shinystan.brmsfit.html)
    (`brms::launch_shinystan`)
-   I believe that it is also very important to understand [RStan: the R
    interface to
    Stan](https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html).
-   Possibly I should read [Using the ShinyStan GUI with rstanarm
    models](https://mc-stan.org/rstanarm/reference/launch_shinystan.stanreg.html)
-   And maybe it could be also helpful to read selected chapters from
    the [rstanarm
    documentation](https://mc-stan.org/rstanarm/index.html), from the
    [{**rstan**} documentation](https://mc-stan.org/rstan/) or generally
    from [Stan User's
    Guide](https://mc-stan.org/docs/stan-users-guide/index.html) resp.
    \[Stan Language Reference
    Manual\](https://mc-stan.org/docs/reference-manual/index.html.

Ooops, this opens up Pandora's box!

I do not even understand completely what the different packages do. What
follows is a first try where I copied from the documentation pages:

> {**rstanarm**} is an R package that emulates other R model-fitting
> functions but uses Stan (via the {**rstan**} package) for the back-end
> estimation. The primary target audience is people who would be open to
> Bayesian inference if using Bayesian software were easier but would
> use frequentist software otherwise.

> RStan is the R interface to Stan. It is distributed on CRAN as the
> {**rstan**} package and its source code is hosted on GitHub.

> Stan is a state-of-the-art platform for statistical modeling and
> high-performance statistical computation.
:::
:::

The next example shows the effect a very narrow $\mu$ prior has. Instead
of a sigma of 20 we provide only a standard deviation of 0.1.

::: my-example
::: my-example-header
::: {#exm-chap04-narrow-mu-prior-m4-2}
: The same model but with a more informative, e.g., very narrow $\mu$
prior
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-narrow-mu-prior-m4.2a}
a: Model with a very narrow $\mu$ prior (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-narrow-mu-prior-m4.2a
#| cache: true


## R code 4.27a ######################
flist <- alist(
  height ~ dnorm(mu, sigma),
  mu ~ dnorm(178, .1),
  sigma ~ dunif(0, 50)
)

## R code 4.30a #####################
start <- list(
  mu = mean(d2_a$height),
  sigma = sd(d2_a$height)
)

## R code 4.28a ######################
m4.2a <- rethinking::quap(flist, data = d2_a, start = start)
rethinking::precis(m4.2a)
```

> "Notice that the estimate for μ has hardly moved off the prior. The
> prior was very concentrated around 178. So this is not surprising. But
> also notice that the estimate for σ has changed quite a lot, even
> though we didn't change its prior at all. Once the golem is certain
> that the mean is near 178---as the prior insists---then the golem has
> to estimate σ conditional on that fact. This results in a different
> posterior for σ, even though all we changed is prior information about
> the other parameter." ([McElreath, 2020, p.
> 89](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=108&annotation=KW4QU3CM))
:::
:::

###### brms

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-narrow-mu-prior-m4.2b}
b: Model with a very narrow $\mu$ prior (brms)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-narrow-mu-prior-m4.2b

m4.2b <- 
  brms::brm(
      formula = height ~ 1,                               
      data = d2_b,                                                   
      family = gaussian(),      
      prior = c(prior(normal(178, 0.1), class = Intercept),
                prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.02b")

base::rbind(
      brms:::summary.brmsfit(m4.1b, prob = .89)$fixed,
      brms:::summary.brmsfit(m4.2b, prob = .89)$fixed
      )
```

Subsetting the `summary.brmsfit()` output of the `brmssummary` object
with `$fixed` provides a convenient way to compare the Intercept
summaries between `m4.1b` and `m4.2b`.
:::
:::
:::
:::
:::

### Sampling

#### Sampling from `rethinking::quap()`

How do we get samples from the quadratic approximate posterior
distribution?

> "... a quadratic approximation to a posterior distribution with more
> than one parameter dimension---μ and σ each contribute one
> dimension---is just a multi-dimensional Gaussian distribution."
> ([McElreath, 2020, p.
> 90](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=109&annotation=B666RAFR))

> "As a consequence, when R constructs a quadratic approximation, it
> calculates not only standard deviations for all parameters, but also
> the covariances among all pairs of parameters. Just like a mean and
> standard deviation (or its square, a variance) are sufficient to
> describe a one-dimensional Gaussian distribution, a list of means and
> a matrix of variances and covariances are sufficient to describe a
> multi-dimensional Gaussian distribution." ([McElreath, 2020, p.
> 90](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=109&annotation=XN54B26K))

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Rethinking and Tidyverse (brms) Codes in separate examples
:::

::: my-watch-out-container
As there are quite big differences in the calculation of the
`r glossary("variance var", "variance")`-`r glossary("covariance cov", "covariance")`
matrix, I will explain the appropriate steps in different examples.
:::
:::

::: my-example
::: my-example-header
::: {#exm-chap04-sampling-quap-m4-1a}
a: Sampling from a `quap()` (Original)
:::
:::

::: my-example-container
::: panel-tabset
###### vcov

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-vcov-a}
a: Variance-covariance matrix (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-vcov-m4-1a

## R code 4.32a vcov #########
rethinking::vcov(m4.1a)
```

> "The above is a variance-covariance matrix. It is the
> multi-dimensional glue of a quadratic approximation, because it tells
> us how each parameter relates to every other parameter in the
> posterior distribution. A variance-covariance matrix can be factored
> into two elements: (1) a vector of variances for the parameters and
> (2) a correlation matrix that tells us how changes in any parameter
> lead to correlated changes in the others. This decomposition is
> usually easier to understand." ([McElreath, 2020, p.
> 90](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=109&annotation=69ZF8LKP))

`vcov()` returns the variance-covariance matrix of the main parameters
of a fitted model object. In the above {**rethinking**} version is uses
the class `map2stan` for a fitted Stan model as `m4.1a` is of class
`map`.
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Two different `vcov()` functions
:::

::: my-watch-out-container
I am explicitly using the package {**rethinking**} for the `vcov()`
function. The same function is also available as a base R function with
`stats::vcov()`. But this generates an error because there is no method
known for an object of class `map` from the {**rethinking**} package.
The help file for `stats::vcov()` only says that the `vcov` object is an
S3 method for classes `lm`, `glm`, `mlm` and `aov` but not for `map`.

> Error in UseMethod("vcov") : no applicable method for 'vcov' applied
> to an object of class "map"

I could have used only `vcov()`. But this only works when the
{**rethinking**} package is already loaded. In that case R knows because
of the class of the object which `vcov()` version to use. In this case:
class of object = `class(m4.1a)` `r class(m4.1a)`.
:::
:::

###### var

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-var-a}
a: List of variances (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-var-m4-1a

## R code 4.33a var adapted ##########
(var_list <- base::diag(rethinking::vcov(m4.1a)))

```

> "The two-element vector in the output is the list of variances. If you
> take the square root of this vector, you get the standard deviations
> that are shown in `rethinking::precis()` (@exm-chap04-m4-1) output."
> ([McElreath, 2020, p.
> 90](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=109&annotation=TAVMWSMM))

Let's check this out:

| Result / Parameter             | mu sd, sigma sd                  |
|--------------------------------|----------------------------------|
| `sqrt(base::unname(var_list))` | `r sqrt(base::unname(var_list))` |
| `precis_m4.1a[["sd"]]`         | `r precis_m4.1a[["sd"]]`         |

: Convert list of variances to standard deviations and compare with the
precis result
:::
:::

###### cor to cov

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-cor-a}
a: Correlation matrix (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-cor-m4-1a

## R code 4.33a cor #############
stats::cov2cor(rethinking::vcov(m4.1a))
```

> "The two-by-two matrix in the output is the correlation matrix. Each
> entry shows the correlation, bounded between −1 and +1, for each pair
> of parameters. The 1's indicate a parameter's correlation with itself.
> If these values were anything except 1, we would be worried. The other
> entries are typically closer to zero, and they are very close to zero
> in this example. This indicates that learning μ tells us nothing about
> σ and likewise that learning σ tells us nothing about μ. This is
> typical of simple Gaussian models of this kind. But it is quite rare
> more generally, as you'll see in later chapters." ([McElreath, 2020,
> p. 90](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=109&annotation=Y9WJAQYQ))
:::
:::

###### cor to cov

::: my-r-code
::: my-r-code-header
<div>

a: Compute covariance matrix from the correlation using `base::sweep()`
(Original)

</div>
:::

::: my-r-code-container
```{r}
#| label: chap04-cov-to-cor-m4-1a

R <- stats::cov2cor(rethinking::vcov(m4.1a))
S <- sqrt(base::diag(rethinking::vcov(m4.1a)))

sweep(sweep(R, 1, S, "*"), 2, S, "*")
```

I wonder how to compute the correlation matrix by hand form the
covariance-variance matrix. I thought that I have to use `sqrt()`, but
it didn't work. After I inspected the code of the `cov2cor()` function I
noticed that it uses the expression `sqrt(1/diag(V))`.

From the `stats::cor()` help file:

> Scaling a covariance matrix into a correlation one can be achieved in
> many ways, mathematically most appealing by multiplication with a
> diagonal matrix from left and right, or more efficiently by using
> `base::sweep(.., FUN = "/")` twice. The `stats::cov2cor()` function is
> even a bit more efficient, and provided mostly for didactical reasons.

For computing the covariance matrix with `base::sweep()` see the answer
in [StackOverflow](https://stats.stackexchange.com/a/407954/207389).
:::
:::

###### Samples1

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-samples1-m4-1a}
a: Samples from the multi-dimensional posterior (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-samples-m4.1a
#| warning: false

## R code 4.34a ########################
set.seed(4)
post3_a <- rethinking::extract.samples(m4.1a, n = 1e4)

bayr::as_tbl_obs(post3_a)
```

> "You end up with a data frame, post, with 10,000 (1e4) rows and two
> columns, one column for μ and one for σ. Each value is a sample from
> the posterior, so the mean and standard deviation of each column will
> be very close to the `r glossary("MAP")` values from before."
> ([McElreath, 2020, p.
> 91](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=110&annotation=4ZE9H4NT))
:::
:::

###### precis

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-precis2-m4.1a}
a: Summary from the samples of the multi-dimensional posterior
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-precis2-m4-1a

## R code 4.35a precis ##################
rethinking::precis(post3_a)
```

Compare these values to the output from the summaries with
`rethinking::precis()` in @exm-chap04-m4-1.
:::
:::

###### plot

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-precis-m4.1a}
a: Plot the samples distribution from the multi-dimensional posterior
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-plot-m4-1a
#| fig-cap: "Samples distribution from the multi-dimensional posterior"

## plot sample posterior a  ##################
plot(post3_a, col = rethinking::col.alpha(rethinking:::rangi2, 0.1))
```

> "see how much they resemble the samples from the grid approximation in
> @fig-chap04-posterior-sample-heights-a. These samples also preserve
> the covariance between $\mu$ and $\sigma$. This hardly matters right
> now, because $\mu$ and $\sigma$ don't covary at all in this model. But
> once you add a predictor variable to your model, covariance will
> matter a lot." ([McElreath, 2020, p.
> 91](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=110&annotation=2B983ITL))
:::
:::

###### Samples2

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-sample2-m4-1a}
a: Extract samples from the vectors of values from a multi-dimensional
Gaussian distribution with `MASS::mvrnorm()` and plot the result
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-plot2-m4.1a
#| fig-cap: "Samples distribution from a multi-dimensional Gaussian distribution with `MASS::mvrnorm()`"

## R code 4.36a ######################
post4_a <- MASS::mvrnorm(n = 1e4, mu = rethinking::coef(m4.1a), 
                      Sigma = rethinking::vcov(m4.1a))

plot(post4_a, col = rethinking::col.alpha(rethinking:::rangi2, 0.1))
```

The function `rethinking::extract.samples()` in the "Sample1" tab is for
convenience. It is just running a simple simulation of the sort you
conducted near the end of @sec-chap03 with @cnj-fig-post-pred-sim-a.

Under the hood the work of `rethinking::extract.samples()` is done by a
multi-dimensional version of `stats::rnorm()`, `MASS::mvrnorm()`. The
function `stats::rnorm()` simulates random Gaussian values, while
`MASS::mvrnorm()` simulates random vectors of multivariate Gaussian
values.
:::
:::
:::
:::
:::

::: my-note
::: my-note-header
How to interpret covariances?
:::

::: my-note-container
> A large covariance can mean a strong relationship between variables.
> However, you can't compare variances over data sets with different
> scales (like pounds and inches). A weak covariance in one data set may
> be a strong one in a different data set with different scales.

> The main problem with interpretation is that the wide range of results
> that it takes on makes it hard to interpret. For example, your data
> set could return a value of 3, or 3,000. This wide range of values is
> caused by a simple fact: *The larger the X and Y values, the larger
> the covariance*. A value of 300 tells us that the variables are
> correlated, but unlike the correlation coefficient, that number
> doesn't tell us exactly how strong that relationship is. The problem
> can be fixed by dividing the covariance by the standard deviation to
> get the correlation coefficient. ([Statistics How
> To](https://www.statisticshowto.com/probability-and-statistics/statistics-definitions/covariance/))
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Confusing `m4.1a` with `m4.3a`
:::

::: my-watch-out-container
Frankly speaking I had troubles to understand why the correlation in
`m4.1a` is almost 0. It turned out that I had unconsciously in mind a
correlation between height and weight, an issue that is raised later in
the chapter with `m4.3a`.

Although with `m4.1b`is a multi-dimensional Gaussian distribution in
discussion but only with the height correlation of $\mu$ and $\sigma$.
$\mu$ of the height distribution does not help you in the estimation of
$\sigma$ in this distribution -- and vice-versa.
:::
:::

#### Sampling from a `brms::brm()` fit

In contrast to the {**rethinking**} approach the {**brms**} doesn't seem
to have the same convenience functions and therefore we have to use
different workarounds to get the same results. To get equivalent output
it is the best strategy to put the Hamilton Monte Carlo
(`r glossary("HMC")`) chains in a data frame and then apply the
appropriate functions.

::: my-example
::: my-example-header
::: {#exm-chap04-sampling-brm-m4-1b}
b: Sampling with `as_draw()` functions from a `brms::brm()` fit
:::
:::

::: my-example-container
::: panel-tabset
###### vcov1

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-vcov-m4-1b}
b: Variance-covariance matrix (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-vcov1-m4-1b

brms:::vcov.brmsfit(m4.1b)
```
:::
:::

The `vcov()` function working with `brmsfit` objects only returns the
first element in the matrix it did for {**rethinking**}. That is, it
appears `brms::vcov.brmsfit()` only returns the variance/covariance
matrix for the single-level `_β_` parameters.

If we want the same information as with `rethinking::vcov()`, we have to put the `r glossary("HMC")` chains in a data frame
with the `brms::as_draws_df()` function as shown in the next tab
"draws".

###### draws

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-draws-m4-1b}
b: Extract the iteration of the Hamilton Monte Carlo (HMC) chains into a
data frame (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-draws-m4-1b
#| warning: false

set.seed(4)
post_b <- brms::as_draws_df(m4.1b)

bayr::as_tbl_obs(post_b)
```
:::
:::

::: my-note
::: my-note-header
Family of `as_draws()` functions and the {**posterior**} package
:::

::: my-note-container
The functions of the family `as_draws()` transform `brmsfit` objects to
`draws` objects, a format supported by the {**posterior**} package.
{**brms**} currently imports the family of `as_draws()`functions from
the {**posterior**} package, a tool for working with posterior
distributions, i.e. for fitting Bayesian models or working with output
from Bayesian models. (See as an introduction [The posterior R
package](https://mc-stan.org/posterior/articles/posterior.html))

------------------------------------------------------------------------

It's also noteworthy that the `as_draws_df()` is part of a larger class
of `as_draws()` functions {**brms**} currently imports from the
{**posterior**} package.

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-class-as-draws}
b: Class of `as_draws()` functions {**brms**}
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-class-as-draws-b

class(post_b)
```
:::
:::

Besides of class `tbl_df` and `tbl`, [subclasses of data.frame with
different
behavior](https://tibble.tidyverse.org/reference/tbl_df-class.html) the
`as_draws_df()` function has created the `draws` class, the parent class
of all supported [draws
formats](https://mc-stan.org/posterior/articles/posterior.html#draws-formats).
:::
:::

###### vcov2

::: my-r-code
::: my-r-code-header
<div>

b: Vector of variances and correlation matrix for `b_Intercept` and
$\sigma$ (Original)

</div>
:::

::: my-r-code-container
```{r}
#| label: chap04-vcov2-m4-1b

dplyr::select(post_b, b_Intercept:sigma) |>
  stats::cov() |>
  base::diag()
```

This result is now the equivalent of the `rethinking::vcov()` in panel
"vcov" of @exm-chap04-sampling-quap-m4-1a.
:::
:::

###### cov

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-cov-m4-1b}
b: Covariance matrix (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-cov-m4.1b

brms:::vcov.brmsfit(m4.1b)
```
:::
:::

###### cor

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-cor-m4-1b}
b: Corrrelation (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-cor-m4-1b

post_b |>
  dplyr::select(b_Intercept, sigma) |>
  stats::cor()
```
:::
:::

###### str

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-str-m4-1b}
b: Variance-covariance matrix (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-str-m4-1b

utils::str(post_b)
```

> The `post_b` object is not just a data frame, but also of class
> `draws_df`, which means it contains three metadata variables ----
> `.chain`, `.iteration`, and `.draw` --- which are often hidden from
> view, but are there in the background when needed. As you'll see,
> we'll make good use of the `.draw` variable in the future. Notice how
> our post data frame also includes a vector named `lp__`. That's the
> log posterior. (Kurz,
> [Sec.4.3.6](https://bookdown.org/content/4857/geocentric-models.html#sampling-from-a-quap-brm-fit.))

For details, see: - The [Log-Posterior (function and
gradient)](https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html#the-log-posterior-function-and-gradient)
section of the Stan Development Team's (2023) vignette [RStan: the R
interface to
Stan](https://cran.r-project.org/web/packages/rstan/vignettes/rstan.html)
and - Stephen Martin's [explanation of the log
posterior](https://discourse.mc-stan.org/t/basic-question-what-is-lp-in-posterior-samples-of-a-brms-regression/17567/2)
on the Stan Forums.
:::
:::

###### summary1

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-summary1-m4-1b}
b: Summarize the extracted iterations of the HMC chains: `base()`
version
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-base-summary1-m4.1b

base::summary(post_b[, 1:2])
```
:::
:::

###### summary2

::: my-r-code
::: my-r-code-header
<div>

b: Summarize the extracted iterations of the HMC chains: `posterior()`
version

</div>
:::

::: my-r-code-container
```{r}
#| label: chap04-posterior-summary2-m4.1b

posterior:::summary.draws(post_b[, 1:2])
```
:::
:::

###### skim

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-skim1-m4-1b}
b: Summarize with `skimr::skim()`
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-skim-m4.1b


skimr::skim(post_b[, 1:2])
```
:::
:::

::: my-note
::: my-note-header
Create summaries of samples that include tiny histograms
:::

::: my-note-container
Kurz didn't mention my suggestion to use skimr::skim() to get tiny
histograms as part of the summary but proposes several other methods:

-   A base R approach by using the transpose of a `stats::quantile()`
    call nested within `base::apply()`
-   A {**tidyverse**} approach
-   A {**brms**} approach by just putting the `brm()` fit object into
    `posterior_summary()`
-   A {**tidybayes**} approach using `tidybayes::mean_hdi()` if you're
    willing to drop the posterior `sd` and
-   Using additionally the [function
    `histospark()`](https://github.com/hadley/precis/blob/master/R/histospark.R)
    (from the unfinished {**precis**} package by Hadley Wickham supposed
    to replace `base::summary()`) to get the tiny histograms and to add
    them into the tidyverse approach.
:::
:::
:::
:::
:::

## Linear prediction {#sec-linear-prediction-a}

> "What we've done above is a Gaussian model of height in a population
> of adults. But it doesn't really have the usual feel of "regression"
> to it. Typically, we are interested in modeling how an outcome is
> related to some other variable, a `r glossary("predictor variable")`.
> If the predictor variable has any statistical association with the
> outcome variable, then we can use it to predict // the outcome. When
> the predictor variable is built inside the model in a particular way,
> we'll have `r glossary("linear regression")`." ([McElreath, 2020, p.
> 91/92](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=111&annotation=UYJ937ER))

::: my-example
::: my-example-header
::: {#exm-scatterplot-adult-height-weight}
: Scatterplot of adult height versus weight
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-scatterplot-adult-height-weight-a}
a: Scatterplot of adult height versus weight (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-height-against-weight-a
#| fig-cap: "Adult height and weight against one another"

## R code 4.37a #####################
plot(d2_a$height ~ d2_a$weight)
```
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-scatterplot-adult-height-weight-b}
b: Scatterplot of adult height versus weight (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-height-against-weight-b
#| fig-cap: "Adult height and weight against one another"


## R code 4.37b #####################
d2_b |> 
    ggplot2::ggplot(ggplot2::aes(height, weight)) + 
    ggplot2::geom_point() +
    ggplot2::theme_bw()
```
:::
:::
:::

There's obviously a relationship: Knowing a person's weight helps to
predict height.
:::
:::

> "To make this vague observation into a more precise quantitative model
> that relates values of `weight` to plausible values of `height`, we
> need some more technology. How do we take our Gaussian model from
> @sec-gaussian-model-of-height and incorporate predictor variables?"
> ([McElreath, 2020, p.
> 92](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=111&annotation=BEMZLGBA))

### The linear model strategy

#### Model definition

> "The `r glossary("linear model")` strategy instructs the golem to
> assume that the predictor variable has a constant and additive
> relationship to the mean of the outcome. The golem then computes the
> posterior distribution of this constant relationship." ([McElreath,
> 2020, p. 92](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=111&annotation=FU3R3CJZ))

> "For each combination of values, the machine computes the posterior
> probability, which is a measure of relative plausibility, given the
> model and data. So the posterior distribution ranks the infinite
> possible combinations of parameter values by their logical
> plausibility." ([McElreath, 2020, p.
> 92](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=111&annotation=28Z2C2SY))

::: my-example
::: my-example-header
::: {#exm-lm-height-weight}
: Linear model definition: Height against weight
:::
:::

::: my-example-container
::: panel-tabset
###### only height

::: my-theorem
::: my-theorem-header
<div>

: Define the linear heights model

</div>
:::

::: my-theorem-container
$$
\begin{align*}
h_{i} \sim \operatorname{Normal}(\mu, \sigma) \space \space (1) \\ 
μ \sim \operatorname{Normal}(178, 20)  \space \space (2) \\ 
μ \sim \operatorname{Uniform}(0, 50)   \space \space (3)      
\end{align*}
$$ {#eq-height-linear-model2-m4-1}

Remember @eq-height-linear-model-m4-1:
:::
:::

1.  **Likelihood**: Represented by the first line.
2.  **Mean prior**: Second line is the chosen $\mu$ (mu, mean) prior. It
    is a broad Gaussian prior, centered on 178 cm, with 95% of
    probability between 178 ± 40 cm.
3.  **Standard deviation prior**: Third line is the chosen $\sigma$
    (sigma, standard deviation) prior.

###### height against weight

::: my-theorem
::: my-theorem-header
<div>

: Define the linear model heights against weights (V1)

</div>
:::

::: my-theorem-container
$$
\begin{align*}
h_{i} \sim \operatorname{Normal}(μ_{i}, σ) \space \space (1) \\ 
μ_{i} = \alpha + \beta(x_{i}-\overline{x}) \space \space (2) \\
\alpha \sim \operatorname{Normal}(178, 20) \space \space (3)  \\ 
\beta \sim \operatorname{Normal}(0,10) \space \space (4) \\
\sigma \sim \operatorname{Uniform}(0, 50) \space \space (5)      
\end{align*}
$$ {#eq-height-weight-linear-model1-m4-3}

Compare the differences with definition inside the Tab "only height".
:::
:::

(1) **Likelihood (Probability of the data)**: The first line is nearly
    identical to before, except now there is a little index $i$ on the
    $μ$ as well as on the $h$. You can read $h_{i}$ as "each height" and
    $\mu_{i}$ as "each $μ$" The mean $μ$ now depends upon unique values
    on each row $i$. So the little $i$ on $\mu_{i}$ indicates that *the
    mean depends upon the row*.

(2) **Linear model**: The mean $μ$ is no longer a parameter to be
    estimated. Rather, as seen in the second line of the model,
    $\mu_{i}$ is constructed from other parameters, $\alpha$ and
    $\beta$, and the observed variable $x$. This line is not a
    stochastic relationship ----- there is no `~` in it, but rather an
    `=` in it ----- because the definition of $\mu_{i}$ is
    deterministic. That is to say that, once we know $\alpha$ and
    $\beta$ and $x_{i}$, we know $\mu_{i}$ with certainty. (More details
    follow in @sec-chap04-linear-model.)

(3) **includes (3),(4) and(5) with** $\alpha, \beta, \sigma$ priors: The
    remaining lines in the model define distributions for the unobserved
    variables. These variables are commonly known as parameters, and
    their distributions as priors. There are three parameters:
    $\alpha, \beta, \sigma$. You've seen priors for $\alpha$ and
    $\sigma$ before, although $\alpha$ was called $\mu$ back then. (More
    details in @sec-chap04-priors)
:::
:::
:::

#### Linear model {#sec-chap04-linear-model}

> "The value $x_{i}$ [in the second line of
> @eq-height-weight-linear-model1-m4-3] is just the weight value on
> row $i$. It refers to the same individual as the height value,
> $h_{i}$, on the same row. The parameters $\alpha$ and $\beta$ are more
> mysterious. Where did they come from? We made them up. The parameters
> $\mu$ and $\sigma$ are necessary and sufficient to describe a Gaussian
> distribution. But $\alpha$ and $\beta$ are instead devices we invent
> for manipulating $\mu$, allowing it to vary systematically across
> cases in the data." ([McElreath, 2020, p.
> 93](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=112&annotation=V3R75T23))

The second line $μ_{i} = \alpha + \beta(x_{i}-\overline{x})$ of the
model definition in @eq-height-weight-linear-model2-m4-3 tells us

> "that you are asking two questions about the mean of the outcome:

> 1.  What is the expected height when $x_{i} = \overline{x}$? The
>     parameter $\alpha$ answers this question, because when
>     $x_{i} = \overline{x}$, $\mu_{i} = \alpha$. For this reason,
>     $\alpha$ is often called the `r glossary("intercept")`. But we
>     should think not in terms of some abstract line, but rather in
>     terms of the meaning with respect to the observable variables.
> 2.  What is the change in expected height, when $x_{i}$ changes by 1
>     unit? The parameter $\beta$ answers this question. It is often
>     called a `r glossary("slope")`, again because of the abstract
>     line. Better to think of it as a rate of change in expectation.
>
> Jointly these two parameters ask the golem to find a line that relates
> x to h, a line that passes through α when $x_{i} = \overline{x}$ and
> has slope $\beta$." ([McElreath, 2020, p.
> 94](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=113&annotation=Y36FAY8E))

#### Priors {#sec-chap04-priors}

> "The prior for $\beta$ in @eq-height-weight-linear-model1-m4-3
> deserves explanation. Why have a Gaussian prior with mean zero? This
> prior places just as much probability below zero as it does above
> zero, and when $\beta = 0$, // weight has no relationship to height.
> To figure out what this prior implies, we have to simulate the prior
> predictive distribution." ([McElreath, 2020, p.
> 94/95](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=114&annotation=DD56M4KY))

The goal in @exm-sim-height-m4-3 is to simulate heights from the model,
using only the priors.

::: my-example
::: my-example-header
::: {#exm-sim-height-m4-3}
: Simulating heights from the model, using only the priors
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-sim-height-m4-3a}
a: Simulating heights from the model, using only the priors (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-sim-heights-m4-3a
#| fig-cap: "Simulating heights from the model, using only the priors (Original)"

## range of weight values to simulate 
## R code 4.38a #####################
N_100_a <- 100 # 100 lines

## set seed for exact reproduction
set.seed(2971)

## simulate lines implied by the priors for alpha and beta
a <- rnorm(N_100_a, 178, 20)
b <- rnorm(N_100_a, 0, 10)


## R code 4.39a #####################
plot(NULL,
  xlim = range(d2_a$weight), ylim = c(-100, 400),
  xlab = "weight", ylab = "height"
)

## added reference line for 0 and biggest man ever
abline(h = 0, lty = 2)
abline(h = 272, lty = 1, lwd = 0.5)
mtext("b ~ dnorm(0,10)")
xbar <- mean(d2_a$weight)
for (i in 1:N_100_a) {
  curve(a[i] + b[i] * (x - xbar),
    from = min(d2_a$weight), to = max(d2_a$weight), add = TRUE,
    col = rethinking::col.alpha("black", 0.2)
  )
}

```
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-sim-height-m4-3b}
b: Simulating heights from the model, using only the priors (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-sim-heights-m4-3b
#| fig-cap: "Simulating heights from the model, using only the priors (Tidyverse)"

set.seed(2971)
# how many lines would you like?
n_lines <- 100

lines <-
  tibble::tibble(n = 1:n_lines,
         a = stats::rnorm(n_lines, mean = 178, sd = 20),
         b = stats::rnorm(n_lines, mean = 0, sd = 10)) |> 
  tidyr::expand_grid(weight = base::range(d2_b$weight)) |> 
  dplyr::mutate(height = a + b * (weight - base::mean(d2_b$weight)))


lines |> 
  ggplot2::ggplot(ggplot2::aes(x = weight, y = height, group = n)) +
  ggplot2::geom_hline(yintercept = c(0, 272), linetype = 2:1, linewidth = 1/3) +
  ggplot2::geom_line(alpha = 1/10) +
  ggplot2::coord_cartesian(ylim = c(-100, 400)) +
  ggplot2::ggtitle("b ~ dnorm(0, 10)") +
  ggplot2::theme_classic()

```
:::
:::
:::

The dashed line are reference lines. One at zero---no one is shorter
than zero---and one at 272 cm for [Robert
Wadlow](https://en.wikipedia.org/wiki/Robert_Wadlow) the world's tallest
person.

"The pattern doesn't look like any human population at all. It
essentially says that the relationship // between weight and height
could be absurdly positive or negative. Before we've even seen the data,
this is a bad model." ([McElreath, 2020, p.
94/95](zotero://select/groups/5243560/items/NFUEVASQ))
([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=114&annotation=VIJ8XZIB))
:::
:::

::: my-theorem
::: my-theorem-header
::: {#thm-log-normal-m4-3}
: Defining the $\beta$ prior as a Log-Normal distribution
:::
:::

::: my-theorem-container
$$
\beta \sim \operatorname{Log-Normal}(0,1)
$$ {#eq-prior-log-normal-m4-3}

------------------------------------------------------------------------

Defining $\beta$ as $Log-Normal(0,1)$ means to claim that the logarithm
of $\beta$ has a $Normal(0,1)$ distribution." ([McElreath, 2020, p.
96](zotero://select/groups/5243560/items/NFUEVASQ))
([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=115&annotation=YQFLM6PR))
:::
:::

::: my-example
::: my-example-header
<div>

: Log-Normal distribution

</div>
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-log-normal-m4-3a}
a: Log-Normal distribution (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-log-normal-m4-3a
#| fig-cap: "Log-Normal distributiom (Original)"


set.seed(4) # to reproduce with tidyverse version
## R code 4.40a ####################
b <- stats::rlnorm(1e4, 0, 1)
rethinking::dens(b, xlim = c(0, 5), adj = 0.1)
```
:::
:::

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-log-normal-m4-3b}
b: Log-Normal distribution (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-log-normal-m4-3b
#| fig-cap: "Log-Normal distribution (Tidyverse)"

set.seed(4)

tibble::tibble(b = stats::rlnorm(1e4, meanlog = 0, sdlog = 1)) |> 
  ggplot2::ggplot(ggplot2::aes(x = b)) +
  ggplot2::geom_density() +
  ggplot2::coord_cartesian(xlim = c(0, 5)) +
  ggplot2::theme_bw()

```
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Argument matching
:::

::: my-watch-out-container
Kurz wrote just `mean` and `sd` instead of `meanlog` and `sdlog.` These
shorter argument names work because of the [partial matching feature in
argument
evaluation](https://cran.r-project.org/doc/manuals/R-lang.html#Argument-matching)
of R functions. But for educational reason (misunderstanding, clashing
with other matching arguments and less readable code) I apply this
technique only sometimes in interactive use.
:::
:::

Base R provides the `dlnorm()` and `rlnorm()` densities for working with
log-normal distributions.

Using the Log-Normal distribution prohibits negative values. This is an
important constraint for height and weight as these variables cannot be
under $0$.

> "The reason is that `exp(x)` is greater than zero for any real number
> $x$. This is the reason that Log-Normal priors are commonplace. They
> are an easy way to enforce positive relationships." ([McElreath, 2020,
> p. 96](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=115&annotation=8DAGFAH2))

###### Compare

::: my-r-code
::: my-r-code-header
::: {#cnj-comparenormal-log-normal-m4-3b}
b: Compare Normal(0,1) with log(Log-Normal(0,1))"
:::
:::

::: my-r-code-container
> If you're unfamiliar with the log-normal distribution, it is the
> distribution whose logarithm is normally distributed. For example,
> here's what happens when we compare Normal(0,1) with
> log(Log-Normal(0,1)).
> ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#priors.))

```{r}
#| label: fig-compare-normal-log-normal
#| fig-cap: "Compare Normal(0,1) with log(Log-Normal(0,1))"

set.seed(4)

tibble::tibble(rnorm             = stats::rnorm(1e5, mean = 0, sd = 1),
       `log(rlognorm)` = base::log(stats::rlnorm(1e5, meanlog = 0, sdlog = 1))) |> 
  tidyr::pivot_longer(tidyr::everything()) |> 

  ggplot2::ggplot(ggplot2::aes(x = value)) +
  ggplot2::geom_density(fill = "grey92") +
  ggplot2::coord_cartesian(xlim = c(-3, 3)) +
  ggplot2::theme_bw() +
  ggplot2::facet_wrap(~ name, nrow = 2)
```

> Those values are ~~what~~ the mean and standard deviation of the
> output from the `rlnorm()` function **after** they are log
> transformed. The formulas for the actual mean and standard deviation
> for the log-normal distribution itself are complicated (see
> [Wikipedia](https://en.wikipedia.org/wiki/Log-normal_distribution)).
> ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#priors.))
:::
:::

###### Log-Normal (Original)

::: my-r-code
::: my-r-code-header
<div>

: Prior predictive simulation again, now with the Log-Normal prior:
(Original)

</div>
:::

::: my-r-code-container
```{r}
#| label: fig-prior-pred-sim-a
#| fig-cap: "Prior predictive simulation again, now with the Log-Normal prior: rethinking version"

## R code 4.41a ###################
set.seed(2971)
N_100_a <- 100 # 100 lines
a <- rnorm(N_100_a, 178, 20)
b <- rlnorm(N_100_a, 0, 1)

## R code 4.39a ###################
plot(NULL,
  xlim = range(d2_a$weight), ylim = c(-100, 400),
  xlab = "weight", ylab = "height"
)
abline(h = 0, lty = 2)
abline(h = 272, lty = 1, lwd = 0.5)
mtext("b ~ dnorm(0,10)")
xbar <- mean(d2_a$weight)
for (i in 1:N_100_a) {
  curve(a[i] + b[i] * (x - xbar),
    from = min(d2_a$weight), to = max(d2_a$weight), add = TRUE,
    col = rethinking::col.alpha("black", 0.2)
  )
}


```
:::
:::

This is much more sensible. There is still a rare impossible
relationship. But nearly all lines in the joint prior for $\alpha$ and
$\beta$ are now within human reason.

###### Log-Normal (Tidyverse)

::: my-r-code
::: my-r-code-header
::: {#cnj-ID-text}
b: Prior predictive simulation again, now with the Log-Normal prior
(Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-prior-pred-sim-b
#| fig-cap: "Prior predictive simulation again, now with the Log-Normal prior (Tidyverse)"

# make a tibble to annotate the plot
text <-
  tibble::tibble(weight = c(34, 43),
         height = c(0 - 25, 272 + 25),
         label  = c("Embryo", "World's tallest person (272 cm)"))

# simulate
base::set.seed(2971)

tibble::tibble(n = 1:n_lines,
       a = stats::rnorm(n_lines, mean = 178, sd = 20),
       b = stats::rlnorm(n_lines, mean = 0, sd = 1)) |> 
  tidyr::expand_grid(weight = base::range(d2_b$weight)) |> 
  dplyr::mutate(height = a + b * (weight - base::mean(d2_b$weight))) |>
  
  # plot
  ggplot2::ggplot(ggplot2::aes(x = weight, y = height, group = n)) +
  ggplot2::geom_hline(yintercept = c(0, 272), linetype = 2:1, linewidth = 1/3) +
  ggplot2::geom_line(alpha = 1/10) +
  ggplot2::geom_text(data = text,
            ggplot2::aes(label = label),
            size = 3) +
  ggplot2::coord_cartesian(ylim = c(-100, 400)) +
  ggplot2::ggtitle("log(b) ~ dnorm(0, 1)") +
  ggplot2::theme_bw()
```
:::
:::
:::
:::
:::

::: my-note
::: my-note-header
What is the correct prior?
:::

::: my-note-container
> "There is no more a uniquely correct prior than there is a uniquely
> correct likelihood. ...
>
> In choosing priors, there are simple guidelines to get you started.
> Priors encode states of information before seeing data. So priors
> allow us to explore the consequences of beginning with different
> information. In cases in which we have good prior information that
> discounts the plausibility of some parameter values, like negative
> associations between height and weight, we can encode that information
> directly into priors. When we don't have such information, we still
> usually know enough about the plausible range of values. And you can
> vary the priors and repeat the analysis in order to study // how
> different states of initial information influence inference.
> Frequently, there are many reasonable choices for a prior, and all of
> them produce the same inference." ([McElreath, 2020, p.
> 95/96](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=115&annotation=UJG4Y8MH))
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! Prior predictive simulation and p-hacking
:::

::: my-watch-out-container
> "A serious problem in contemporary applied statistics is
> `r glossary("p-hacking")`, the practice of adjusting the model and the
> data to achieve a desired result. The desired result is usually a
> p-value less then 5%. The problem is that when the model is adjusted
> in light of the observed data, then p-values no longer retain their
> original meaning. False results are to be expected." ([McElreath,
> 2020, p. 97](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=116&annotation=MKCQIVRV))

::: my-resource
::: my-resource-header
Prior predictive simulation and `r glossary("p-hacking")`
:::

::: my-resource-container
The paper by Simmons, Nelson and Simonsohn (2011), [False-positive
psychology: Undisclosed flexibility in data collection and analysis
allows presenting anything as
significant](https://journals.sagepub.com/doi/10.1177/0956797611417632),
is often cited as an introduction to the problem.

Another more recent publication is: MacCoun, R. J. (2022). MacCoun, R.
J. (2022). P-hacking: A Strategic Analysis. In L. Jussim, J. A.
Krosnick, & S. T. Stevens (Eds.), Research Integrity: Best Practices for
the Social and Behavioral Sciences (pp. 295--315). Oxford University
Press. https://doi.org/10.1093/oso/9780190938550.003.0011.

A book wide treatment is: Chambers, C. (2017). Seven Deadly Sins of
Psychology: A Manifesto for Reforming the Culture of Scientific Practice
(Illustrated Edition). Princeton University Press.
:::
:::
:::
:::

### Finding the posterior distribution

In @exm-lm-height-weight we have compared the linear heights model
(@eq-height-linear-model2-m4-1) with the linear model heights against
weights (@eq-height-weight-linear-model1-m4-3). Now we repeat the
linear model heights against weights and compare it with the
corresponding R Code.

::: my-example
::: my-example-header
::: {#exm-lm-height-weight-code-m4-3}
: Compare formula and R code for linear model heights against weights
(V2)
:::
:::

::: my-example-container
::: panel-tabset
###### Formula

::: my-theorem
::: my-theorem-header
::: {#thm-linear-heights-model-v2-m4-3}
: Formula of the linear model heights against weights (V2)
:::
:::

::: my-theorem-container
$$
\begin{align*}
h_{i} \sim \operatorname{Normal}(μ_{i}, σ) \space \space (1) \\ 
μ_{i} = \alpha + \beta(x_{i}-\overline{x}) \space \space (2) \\
\alpha \sim \operatorname{Normal}(178, 20) \space \space (3)  \\ 
\beta \sim \operatorname{Log-Normal}(0,10) \space \space (4) \\
\sigma \sim \operatorname{Uniform}(0, 50) \space \space (5)      
\end{align*}
$$ {#eq-height-weight-linear-model2-m4-3}
:::
:::

###### R Code

::: my-r-code
::: my-r-code-header
::: {#cnj-lm-code-v2-m4-3}
: R Code of the linear model heights against weights (V2)
:::
:::

::: my-r-code-container
```         
height ~ dnorm(mu, sigma)     # (1)
mu <- a + b * (weight - xbar) # (2)
a ~ dnorm(178, 20)            # (3)        
b ~ dlnorm(0, 10)             # (4)        
sigma ~ dunif(0, 50)          # (5)       
```
:::
:::

Notice that the linear model, in the R code on the right-hand side, uses
the R assignment operator, `<-` instead of the symbol `=`.
:::
:::
:::

::: my-example
::: my-example-header
::: {#exm-find-post-dist-m4-3}
: Find the posterior distribution of the linear height-weight model
:::
:::

::: my-example-container
::: panel-tabset
###### Original1

::: my-r-code
::: my-r-code-header
::: {#cnj-find-post-dist-m4-3-a}
a: Find the posterior distribution of the linear height-weight model
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: find-post-dist-m4-3a

## R code 4.42a #############################

# define the average weight, x-bar
xbar_a <- mean(d2_a$weight)

# fit model
m4.3a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * (weight - xbar_a),
    a ~ dnorm(178, 20),
    b ~ dlnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = d2_a
)

# summary result
## R code 4.44a ############################
rethinking::precis(m4.3a)
```
:::
:::

###### Original2 (log)

::: my-r-code
::: my-r-code-header
::: {#cnj-find-post-dist-log-m4-3a_2}
a: Find the posterior distribution of the linear height-weight model:
Log version (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: find-post-dist-log-m4-3a_2

## R code 4.43a ############################
m4.3a_2 <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + exp(log_b) * (weight - xbar_a),
    a ~ dnorm(178, 20),
    log_b ~ dnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = d2_a
)

rethinking::precis(m4.3a_2)
```
:::
:::

Note the `exp(log_b)` in the definition of `mu`. This is the same model
as `m4.3`. It will make the same predictions. But instead of `β` in the
posterior distribution, you get `log((β)`. It is easy to translate
between the two, because $\beta = exp(log(\beta))$. In code form:
`b <- exp(log_b)`.

###### Tidyverse

::: my-r-code
::: my-r-code-header
::: {#cnj-find-post-dist-m4-3b_2}
b: Find the posterior distribution of the linear height-weight model
(Tidyverse)
:::
:::

::: my-r-code-container

```{r}
#| label: find-post-dist-m4-3b
#| cache: true

d2_b <-
  d2_b |> 
  dplyr::mutate(weight_c = weight - base::mean(weight))

m4.3b <- 
  brms::brm(data = d2_b, 
      family = gaussian,
      height ~ 1 + weight_c,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b, lb = 0),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b")

brms:::summary.brmsfit(m4.3b)
```
:::
:::

Remember: The detailed explication of the syntax for the following
`brms::brm()` function is in the "Tidyverse" tab of @exm-chap04-m4-1.

Unlike with McElreath's `rethinking::quap()` formula syntax, Kurz is not
aware if we can just specify something like `weight – xbar` in the
`formula` argument in `brms::brm()`.

However, the alternative is easy: Just make a new variable in the data
that is equivalent to `weight – mean(weight)`. We'll call it `weight_c`.

###### Tidyverse2 (log)

::: my-r-code
::: my-r-code-header
::: {#cnj-find-post-dist-log-m4-3b_2}
Find the posterior distribution of the linear height-weight model (log
version) (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: find-post-dist-log-m4-3b_2
#| cache: true

m4.3b_log <- 
  brms::brm(data = d2_b, 
      family = gaussian,
      brms::bf(height ~ a + exp(lb) * weight_c,
         a ~ 1,
         lb ~ 1,
         nl = TRUE),
      prior = c(brms::prior(normal(178, 20), class = b, nlpar = a),
                brms::prior(normal(0, 1), class = b, nlpar = lb),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b_log")

brms:::summary.brmsfit(m4.3b_log)
```

Remember: The detailed explication of the syntax for the following
`brms::brm()` function is in the "Tidyverse" tab of @exm-chap04-m4-1.

The difference is for the $\beta$ parameter, which we called `lb` in the
`m4.3b_log` model. If we term that parameter from `m4.3b` as
$\beta^{m4.3b}$ and the one from our new log model $\beta^{m4.3b_log}$, it
turns out that $\beta^{m4.3b} = exp(\beta^{m4.3b_log})$.
:::
:::

Compare the result with the previous tab "Tidyverse" in
@cnj-find-post-dist-m4-3b_2.

###### fixef

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-fixef-m4-3b}
b: Extract and compare the population-level (fixed) effects from object
`m4.3b` and the form log version `m4.3b_log`
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-fixed-effects-m4-3b
#| results: hold

brms::fixef(m4.3b)["weight_c", "Estimate"]
brms::fixef(m4.3b_log)["lb_Intercept", "Estimate"] |> exp()
```
:::
:::

They're the same within simulation variance.

###### Trace plot

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-trace-plot-m4-3b}
Display trace plots (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: trace-plot-m4-3b_2

brms:::plot.brmsfit(m4.3b, widths = c(1, 2))
```
:::
:::

::: my-note
::: my-note-header
Checking MCMC chains with trace plots and trank plots
:::

::: my-note-container
At the moment I didn't learn how to interpret these types of graphic
output. McElreath's explains in later video lectures that it is
important that `r glossary("trace plot")` cover the same location in the
vertical axis (e.g. they do not jump around) and show that all different
chains alternate in their (top) positions. I mentioned here the top
position because this is the place where irregularities can be detected
more easily.

In the above example it is difficult to decide if this is the case
because the color differences of the different chains are weak. But it
is general difficult to inspect trace plot, therefore McElreath proposes
trace rank plots or `r glossary("trank plot")` (his terminus) in
@sec-chap09.
:::
:::
:::
:::
:::

To understand the differences in syntax between {**rethinking**} and
{**brms**} it is also quite revealing to look at
@tbl-mirror-rethinking-tidyverse. I have inserted the following table
from Kurz' explanation of model `b4.3`.

| {**rethinking**} package                                    | {**brms**} package:                        |
|------------------------------------|------------------------------------|
| $\text{height}_i \sim \operatorname{Normal}(\mu_i, \sigma)$ | `family = gaussian`                        |
| $\mu_i = \alpha + \beta \text{weight}_i$                    | `height ~ 1 + weight_c`                    |
| $\alpha \sim \operatorname{Normal}(178, 20)$                | `prior(normal(178, 20), class = Intercept` |
| $\beta \sim \operatorname{Log-Normal}(0, 1)$                | `prior(lognormal(0, 1), class = b)`        |
| $\sigma \sim \operatorname{Uniform}(0, 50)$                 | `prior(uniform(0, 50), class = sigma)`     |

: Compare statistical notation of rethinking with brms package
{#tbl-mirror-rethinking-tidyverse}

### Interpreting the posterior distribution

::: my-important
::: my-important-header
What do parameters mean?
:::

::: my-important-container
"Posterior probabilities of parameter values describe the relative
compatibility of different states of the world with the data, according
to the model. These are small world (@sec-chap02) numbers." ([McElreath,
2020, p. 99](zotero://select/groups/5243560/items/NFUEVASQ))
([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=118&annotation=N3YFPKMW))
:::
:::

Statistical models are hard to interpret. Plotting posterior
distributions and posterior predictions is better than attempting to
understand a table.

#### Table of marginal distributions

There are many different options in the tidyverse approach, repsectively
with {**brms**}. The most

::: my-example
::: my-example-header
::: {#exm-chap04-table-interpretation}
Inspect the marginal posterior distributions of the parameters
:::
:::

::: my-example-container
::: panel-tabset
###### precis (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-inspect-lm-table-a}
a: Inspect the marginal posterior distributions of the parameters
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-inspect-lm-table-4-3a


## R code 4.44 ##########
rethinking::precis(m4.3a)
```

------------------------------------------------------------------------

1.  First row: quadratic approximation for $\alpha$
2.  Second row: quadratic approximation for $\beta$
3.  Third row: quadratic approximation for $\sigma$
:::
:::

Let's focus on b ($\beta$), because it's the new parameter. Since
($\beta$) is a slope, the value 0.90 can be read as *a person 1 kg
heavier is expected to be 0.90 cm taller*. 89% of the posterior
probability ($94.5-5.5$) lies between 0.84 and 0.97. That suggests that
($\beta$) values close to zero or greatly above one are highly
incompatible with these data and this model. It is most certainly not
evidence that the relationship between weight and height is linear,
because the model only considered lines. It just says that, if you are
committed to a line, then lines with a slope around 0.9 are plausible
ones.

###### vcov (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-lm-vcov-a}
b: Display variance-covariance matrix (Original)
:::
:::

::: my-r-code-container
The numbers in the default `rethinking::precis()` output aren't
sufficient to describe the quadratic posterior completely. Therefore we
also need to inspect the variance-covariance matrix.

```{r}
#| label: chap04-lm-vcov-m4-3a

## R code 4.45a ################
round(rethinking::vcov(m4.3a), 3)
```
:::
:::

There is very little covariation among the parameters in this case.

###### pairs (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-lm-pairs-a}
b: Display variance-covariance matrix (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-lm-pairs-m4-3a
#| warning: false

rethinking::pairs(m4.3a)
```
:::
:::

The graphic shows both the marginal posteriors and the covariance.

###### brms (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-post-summary-b}
b: Display the marginal posterior distributions of the parameters (brms)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-post-summary-m4-3b

brms::posterior_summary(m4.3b, probs = c(0.055, 0.945))[1:3, ] |> 
  round(digits = 2)
```

Looking up `brms::posterior_summary()` I learned that the "function
mainly exists to retain backwards compatibility". It will eventually be
replaced by functions of the {**posterior**} package".
([brms](https://paul-buerkner.github.io/brms/reference/posterior_summary.html))
:::
:::

::: my-watch-out
::: my-watch-out-header
WATCH OUT! How to insert coefficients into functions with {**brms**}?
:::

::: my-watch-out-container
> {**brms**} does not allow users to insert coefficients into functions
> like `exp()` within the conventional `formula` syntax. We can fit a
> {**brms**} model like McElreath's `m4.3a` if we adopt what's called
> the [non-linear
> syntax](https://cran.r-project.org/web/packages/brms/vignettes/brms_nonlinear.html).
> The non-linear syntax is a lot like the syntax McElreath uses in
> {**rethinking**} in that it typically includes both predictor and
> variable names in the `formula`.
> ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#overthinking-logs-and-exps-oh-my.))

Kurz promises to explain later in his ebook what non-linear syntax
exactly means.
:::
:::

###### draws1 (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-post-summarize-draws-m4-3b}
b: Display the marginal posterior distributions of the parameters with
`summarize_draws()` from {**posterior**}
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-post-summarize-draws-m4-3b

posterior::summarize_draws(m4.3b, "mean", "median", "sd", 
                           ~quantile(., probs = c(0.055, 0.945)),
                           .num_args = list(sigfig = 2))[1:3, ]
```
:::
:::

###### draws2 (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-summary-draws-m4-3b}
b: Display the marginal posterior distributions of the parameters of
`brms::as_draws_array()` objects
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-summary-draws-m4-3b

## summary for draws object
summary(brms::as_draws_array(m4.3b))[1:3, ]

```
:::
:::

###### vcov1 (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-vcov1-b}
b: Display variance-covariance matrix (brms)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-vcov-m4-3b

brms:::vcov.brmsfit(m4.3b) |> 
  round(3)
```

We got the variance/covariance matrix of the `intercept` and `weight_c`
coefficient but not $\sigma$ however. To get that, we'll have to extract
the posterior draws and use the `cov()` function, instead. (See next tab
"vcov2 (T)")
:::
:::

###### vcov2 (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-vcov2-b}
b: Variance-covariance matrix with {**posterior**} draws objects (brms)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-cov-m4-3b
#| warning: false

brms::as_draws_df(m4.3b) |>
  dplyr::select(b_Intercept:sigma) |>
  stats::cov() |>
  base::round(digits = 3)
```
:::
:::

###### pairs (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-lm-pairs-b}
b: Display variance-covariance matrix (brms)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-lm-pairs-m4-3b

brms:::pairs.brmsfit(m4.3b)
```
:::
:::

The graphic shows both the marginal posteriors and the covariance.
:::
:::
:::

#### Plotting posterior inference against the data

::: my-example
::: my-example-header
::: {#exm-chap04-plot-post-inf-against-data}
: Height plotted against weight with linear regression (line at the
posterior mean)
:::
:::

::: my-example-container
::: panel-tabset
###### Original

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-plot-raw-data-line-m4-3a}
a: Height in centimeters (vertical) plotted against weight in kilograms
(horizontal), with the line at the posterior mean plotted in black
(Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-raw-data-line-m4-3a
#| fig-cap: "Height in centimeters (vertical) plotted against weight in kilograms (horizontal), with the line at the posterior mean plotted in black: rethinking version"

## R code 4.46a ############################################
plot(height ~ weight, data = d2_a, col = rethinking::rangi2)
post_m4.3a <- rethinking::extract.samples(m4.3a)
a_map <- mean(post_m4.3a$a)
b_map <- mean(post_m4.3a$b)
curve(a_map + b_map * (x - xbar_a), add = TRUE)
```
:::
:::

Each point in this plot is a single individual. The black line is
defined by the mean slope $\beta$ and mean intercept $\alpha$. This is
not a bad line. It certainly looks highly plausible. But there are an
infinite number of other highly plausible lines near it. See next tab

###### Tidyverse1

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-plot-raw-data-line-m4-3b}
b: Height in centimeters (vertical) plotted against weight_c
(horizontal), with the line at the posterior mean plotted in black with standardized centered weight values (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-raw-data-line-m4-3b
#| fig-cap: "Height in centimeters (vertical) plotted against weight_c (horizontal), with the line at the posterior mean plotted in black with standardized centered weight values (Tidyverse)"

d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight_c, y = height)) +
  ggplot2::geom_abline(intercept = brms:::fixef.brmsfit(m4.3b)[1], 
              slope     = brms:::fixef.brmsfit(m4.3b)[2]) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::theme_bw()
```

Note how the breaks on our `x`-axis look off. That's because we fit the
model with `weight_c` and we plotted the points in that metric, too.
Since we computed `weight_c` by subtracting the mean of weight from the
data, we can adjust the `x`-axis break point labels by simply adding
that value back. (See next tab "Tidyverse2")
:::
:::

Further note the use of the `brms:::fixef.brmsfit()` function within
`ggplot2::geom_abline()`. The function extracts the population-level
('fixed') effects from a `brmsfit` object.

###### Tidyverse2

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-plot-raw-data-line2-m4-3b}
b: Height in centimeters (vertical) plotted against weight_c
(horizontal), with the line at the posterior mean plotted in black with centered weights in kg (Tidyverse)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-raw-data-line2-m4-3b
#| fig-cap: "Height in centimeters (vertical) plotted against weight in kilograms (horizontal), with the line at the posterior mean plotted in black with centered weights in kg (Tidyverse)"

labels <-
  c(-10, 0, 10) + base::mean(d2_b$weight) |> 
  base::round(digits = 0)

d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight_c, y = height)) +
  ggplot2::geom_abline(intercept = brms::fixef(m4.3b, probs = c(0.055, 0.945))[[1]], 
              slope     = brms::fixef(m4.3b, probs = c(0.055, 0.945))[[2]]) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::scale_x_continuous("weight",
                     breaks = c(-10, 0, 10),
                     labels = labels) +
  ggplot2::theme_bw()
```
:::
:::

Note the use of the `brms:::fixef.brmsfit()` function within
`ggplot2::geom_abline()`. The function extracts the population-level
('fixed') effects from a `brmsfit` object.
:::
:::
:::

::: my-note
::: my-note-header
What are population-level ("fixed") effects?
:::

::: my-note-container
I didn't know what it meant exactly that `brms::fixef()` extracts the
population-level ('fixed') effect from a `brmsfit` object. After reading
other books I now understand that this means effects of the whole
population in contrast to effects on the idividual level (so-called
"random effect"). (The terms "fixed" and "random" effects are
misleading, as both effects are random.)

> "Generally, these are the three types of parameters in multi-level
> models: the population-level estimate (commonly called fixed effects),
> the participant-level estimates (random effects) and the
> participant-level variation." ([Schmettow, 2022, p.
> 268](zotero://select/groups/5254846/items/HCW7A9WK))
> ([pdf](zotero://open-pdf/groups/5254846/items/WF7URHBJ?page=276&annotation=E5FND2P9))

> "There is a lot of confusion about the type of models that we deal
> with in this chapter. They have also been called hierarchical models
> or mixed effects models. The "mixed" stands for a mixture of so called
> fixed effects and random effects. The problem is: if you start by
> understanding what fixed effects and random effects are, confusion is
> programmed, not only because there exist several very different
> definitions. In fact, it does not matter so much whether an estimate
> is a fixed effect or random effect. As we will see, you can construct
> a multi-level model by using just plain descriptive summaries. What
> matters is that a model contains estimates on population level and on
> participant level. The benefit is, that a multi-level model can answer
> the same question for the population as a whole and for every single
> participant." ([Schmettow, 2022, p.
> 278](zotero://select/groups/5254846/items/HCW7A9WK))
> ([pdf](zotero://open-pdf/groups/5254846/items/WF7URHBJ?page=286&annotation=N4M4YSCS))

::: my-resource
::: my-resource-header
Fixed and random effects
:::

::: my-resource-container
After I googled it turned out that there is a great discussion about
[Fixed Effects in Linear
Regression](https://statisticsglobe.com/fixed-effects-linear-regression)
and generally about fixed, random and mixed models (See Cross Validated
[here](https://stats.stackexchange.com/questions/4700/what-is-the-difference-between-fixed-effect-random-effect-and-mixed-effect-mode)
and
[here](https://stats.stackexchange.com/questions/21760/what-is-a-difference-between-random-effects-fixed-effects-and-marginal-model)).

There is also a glossary entry in
[statistics.com](https://www.statistics.com/glossary/fixed-effects/).

And last but not least there are academic papers on this topic:

-   [Let's Talk About Fixed
    Effects](https://link.springer.com/article/10.1007/s11577-020-00699-8)),
-   tutorials ([Fixed Effects
    Regression](https://www.econometrics-with-r.org/10-3-fixed-effects-regression.html),
-   [Fixed or Random
    Effects](https://bookdown.org/Yuleng/polimethod/fixed.html)) and
-   R packages (\[fixest\]https://lrberge.github.io/fixest/index.html,
    see it's
-   [introduction](https://lrberge.github.io/fixest/articles/fixest_walkthrough.html))
    dedicated especially to this subject.
:::
:::
:::
:::

#### Adding uncertainty around the mean

> "Plots of the average line, like @fig-raw-data-line-m4-3a, are useful
> for getting an impression of the magnitude of the estimated influence
> of a variable. But they do a poor job of communicating uncertainty.
> Remember, the posterior distribution considers every possible
> regression line connecting height to weight. It assigns a relative
> plausibility to each. This means that each combination of $\alpha$ and
> $\beta$ has a posterior probability. It could be that there are many
> lines with nearly the same posterior probability as the average line.
> Or it could be instead that the posterior distribution is rather
> narrow near the average line." ([McElreath, 2020, p.
> 101](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=120&annotation=YN73Y4A4))

::: my-example
::: my-example-header
::: {#exm-chap04-uncertainty-around-mean}
: Inspect the uncertainty around the mean
:::
:::

::: my-example-container
::: panel-tabset
###### rows (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-show-some-post-samples-m4-3a}
a: Show some posterior data rows (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-show-some-post-samples-m4-3a
#| warning: false

## R code 4.47a ################
# post_m4.3a <- rethinking::extract.samples(m4.3a) # already in previous listing
set.seed(4)
bayr::as_tbl_obs(post_m4.3a)
```
:::
:::

> "Each row is a correlated random sample from the joint posterior of
> all three parameters, using the covariances provided by `vcov(m4.3a)`
> in Tab"vcov (O)" in @exm-chap04-table-interpretation. The paired
> values of `a` and `b` on each row define a line. The average of very
> many of these lines is the posterior mean line. But the scatter around
> that average is meaningful, because it alters our confidence in the
> relationship between the predictor and the outcome" ([McElreath, 2020,
> p. 101](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=120&annotation=KS7NLQ9U))

###### plot1 (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-fig-chap04-plot-10-points-m4-3a}
a: Plot 20 sampled lines of 10 data points to the uncertainty around the
mean (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-plot-10-points-m4-3a
#| fig-cap: "Samples from the quadratic approximate posterior distribution for the height/weight model, m4.3a. 20 lines sampled from 10 data points of the posterior distribution, showing the uncertainty in the regression relationship (Original)"

## R code 4.48a ##########################
N10_a <- 10
dN10_a <- d2_a[1:N10_a, ]
mN10_a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * (weight - mean(weight)),
    a ~ dnorm(178, 20),
    b ~ dlnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = dN10_a
)

## R code 4.49a ##############################
# extract 20 samples from the posterior
set.seed(4)
post_20_m4.3a <- rethinking::extract.samples(mN10_a, n = 20)

# display raw data and sample size
plot(dN10_a$weight, dN10_a$height,
  xlim = range(d2_a$weight), ylim = range(d2_a$height),
  col = rethinking::rangi2, xlab = "weight", ylab = "height"
)
mtext(rethinking:::concat("N = ", N10_a))

# plot the lines, with transparency
for (i in 1:20) {
  curve(post_20_m4.3a$a[i] + post_20_m4.3a$b[i] * (x - mean(dN10_a$weight)),
    col = rethinking::col.alpha("black", 0.3), add = TRUE
  )
}

```
:::
:::

> "By plotting multiple regression lines, sampled from the posterior, it
> is easy to see both the highly confident aspects of the relationship
> and the less confident aspects. The cloud of regression lines displays
> greater uncertainty at extreme values for weight." ([McElreath, 2020,
> p. 102](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=121&annotation=CX5K4AHV))

###### plot2 (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-fig-chap04-plot-50-points-m4-3a}
a: Plot 20 sampled lines of 50 data points to the uncertainty around the
mean (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-plot-50-points-m4-3a
#| fig-cap: "Samples from the quadratic approximate posterior distribution for the height/weight model, m4.3a. 20 lines sampled from 50 data points of the posterior distribution, showing the uncertainty in the regression relationship (Original)"

## R code 4.48a ######################
N50_a <- 50
dN50_a <- d2_a[1:N50_a, ]
mN50_a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * (weight - mean(weight)),
    a ~ dnorm(178, 20),
    b ~ dlnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = dN50_a
)

## R code 4.49a ######################
# extract 20 samples from the posterior
set.seed(4)
post_50_m4.3a <- rethinking::extract.samples(mN50_a, n = 20)

# display raw data and sample size
plot(dN50_a$weight, dN50_a$height,
  xlim = range(d2_a$weight), ylim = range(d2_a$height),
  col = rethinking::rangi2, xlab = "weight", ylab = "height"
)
mtext(rethinking:::concat("N = ", N50_a))

# plot the lines, with transparency
for (i in 1:20) {
  curve(post_50_m4.3a$a[i] + post_50_m4.3a$b[i] * (x - mean(dN50_a$weight)),
    col = rethinking::col.alpha("black", 0.3), add = TRUE
  )
}

```
:::
:::

> "Notice that the cloud of regression lines grows more compact as the
> sample size increases. This is a result of the model growing more
> confident about the location of the mean." ([McElreath, 2020, p.
> 102](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=121&annotation=92MJX4AG))

###### plot3 (O)

::: my-r-code
::: my-r-code-header
::: {#cnj-fig-chap04-plot-352-points-m4-3a}
a: Plot 20 sampled lines of 352 data points to the uncertainty around
the mean (Original)
:::
:::

::: my-r-code-container
```{r}
#| label: fig-chap04-plot-352-points-m4-3a
#| fig-cap: "Samples from the quadratic approximate posterior distribution for the height/weight model, m4.3a. 20 lines sampled from all 352 data points of the posterior distribution, showing the uncertainty in the regression relationship."

## R code 4.48, 4.49 ###########################
N352_a <- 352
dN352_a <- d2_a[1:N352_a, ]
mN352_a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * (weight - mean(weight)),
    a ~ dnorm(178, 20),
    b ~ dlnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = dN352_a
)

# extract 20 samples from the posterior
post_352_m4.3a <- rethinking::extract.samples(mN352_a, n = 20)

# display raw data and sample size
plot(dN352_a$weight, dN352_a$height,
  xlim = range(d2_a$weight), ylim = range(d2_a$height),
  col = rethinking::rangi2, xlab = "weight", ylab = "height"
)
mtext(rethinking:::concat("N = ", N352_a))

# plot the lines, with transparency
for (i in 1:20) {
  curve(post_352_m4.3a$a[i] + post_352_m4.3a$b[i] * (x - mean(dN352_a$weight)),
    col = rethinking::col.alpha("black", 0.3), add = TRUE
  )
}

```
:::
:::

> "Notice that the cloud of regression lines grows more compact as the
> sample size increases. This is a result of the model growing more
> confident about the location of the mean." ([McElreath, 2020, p.
> 102](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=121&annotation=92MJX4AG))

###### rows (T)

::: my-r-code
::: my-r-code-header
::: {#cnj-chap04-show-some-post-samples-m4-3b}
b: Extract the iteration of the Hamiliton Monte Carlo
(`r glossary("HMC")`) chains into a data frame and show 8 rows
(Tidyverse)"
:::
:::

::: my-r-code-container
```{r}
#| label: chap04-show-some-post-samples-m4-3b
#| warning: false

post_m4.3b <- brms::as_draws_df(m4.3b)
set.seed(4)
bayr::as_tbl_obs(post_m4.3b)
```

:::::{.my-note}
:::{.my-note-header}
Printout interpreted
:::
::::{.my-note-container}
-   `b_Intercept` represents `a` in the rethinking version.
-   `b_weight_c` represents `b` in the rethinking version. But why did
    we have to calculate it different?
-   `sigma` is the quadratic approximation of the standard deviation.
-   `lprior` is -- I assume -- the log prior.
-   `l__` is what??
::::
:::::


:::
:::

Instead of `rethinking::extract.samples()` the {**brms**} packages
extract all the posterior draws with `brms::as_draws_df()`. We have
already done this with model `m4.1b` (@exm-chap04-sampling-brm-m4-1b in
tab "draws1").

###### plots (T)

::: my-r-code
::: my-r-code-header
<div>

b: Plot 20 sampled lines of 10, 50, 150 and 352 data points to the
uncertainty around the mean (Original)

</div>
:::

::: my-r-code-container
```{r}
#| label: chap04-plot-sampled-lines-m4-3b
#| warning: false
#| cache: true

## 1. Calculate all four models ################
dN10_b <- 10

m4.3b_010 <- 
  brms::brm(data = d2_b |>
      dplyr::slice(1:dN10_b),  # note our tricky use of `N` and `slice()`
      family = gaussian,
      height ~ 1 + weight_c,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b_010")

dN50_b <- 50

m4.3b_050 <- 
  brms::brm(data = d2_b |>
      dplyr::slice(1:dN50_b), 
      family = gaussian,
      height ~ 1 + weight_c,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b_050")

dN150_b <- 150

m4.3b_150 <- 
  brms::brm(data = d2_b |>
      dplyr::slice(1:dN150_b), 
      family = gaussian,
      height ~ 1 + weight_c,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b_150")

dN352_b <- 352

m4.3b_352 <- 
  brms::brm(data = d2_b |>
      dplyr::slice(1:dN352_b), 
      family = gaussian,
      height ~ 1 + weight_c,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b_352")


## 2. put model chains into dfs ##########
post010_m4.3b <- brms::as_draws_df(m4.3b_010)
post050_m4.3b <- brms::as_draws_df(m4.3b_050)
post150_m4.3b <- brms::as_draws_df(m4.3b_150)
post352_m4.3b <- brms::as_draws_df(m4.3b_352)


## 3. prepare plots ##########
p10 <- 
  ggplot2::ggplot(data =  d2_b[1:10, ], 
         ggplot2::aes(x = weight_c, y = height)) +
  ggplot2::geom_abline(data = post010_m4.3b |> dplyr::slice(1:20),
              ggplot2::aes(intercept = b_Intercept, slope = b_weight_c),
              linewidth = 1/3, alpha = .3) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight_c),
                  ylim = base::range(d2_b$height)) +
  ggplot2::labs(subtitle = "N = 10")

p50 <-
  ggplot2::ggplot(data =  d2_b[1:50, ], 
         ggplot2::aes(x = weight_c, y = height)) +
  ggplot2::geom_abline(data = post050_m4.3b |> dplyr::slice(1:20),
              ggplot2::aes(intercept = b_Intercept, slope = b_weight_c),
              linewidth = 1/3, alpha = .3) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight_c),
                  ylim = base::range(d2_b$height)) +
  ggplot2::labs(subtitle = "N = 50")

p150 <-
  ggplot2::ggplot(data =  d2_b[1:150, ], 
         ggplot2::aes(x = weight_c, y = height)) +
  ggplot2::geom_abline(data = post150_m4.3b |> dplyr::slice(1:20),
              ggplot2::aes(intercept = b_Intercept, slope = b_weight_c),
              linewidth = 1/3, alpha = .3) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight_c),
                  ylim = base::range(d2_b$height)) +
  ggplot2::labs(subtitle = "N = 150")

p352 <- 
  ggplot2::ggplot(data =  d2_b[1:352, ], 
         ggplot2::aes(x = weight_c, y = height)) +
  ggplot2::geom_abline(data = post352_m4.3b |> dplyr::slice(1:20),
              ggplot2::aes(intercept = b_Intercept, slope = b_weight_c),
              linewidth = 1/3, alpha = .3) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight_c),
                  ylim = base::range(d2_b$height)) +
  ggplot2::labs(subtitle = "N = 352")

## 4. display plots {patchwork} ########
library(patchwork)
(p10 + p50 + p150 + p352) &
  ggplot2::scale_x_continuous("weight",
                     breaks = c(-10, 0, 10),
                     labels = labels) &
  ggplot2::theme_bw()
```
:::
:::

> "Notice that the cloud of regression lines grows more compact as the
> sample size increases. This is a result of the model growing more
> confident about the location of the mean." ([McElreath, 2020, p.
> 102](zotero://select/groups/5243560/items/NFUEVASQ))
> ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=121&annotation=92MJX4AG))
:::
:::
:::


#### Plotting regression intervals and contours

> “The cloud of regression lines in @exm-chap04-uncertainty-around-mean is an appealing display, because it communicates uncertainty about the relationship in a way that many people find intuitive. But it’s more common, and often much clearer, to see the uncertainty displayed by plotting an interval or contour around the average regression line.” ([McElreath, 2020, p. 102](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=121&annotation=EUY9378D))

:::::{.my-procedure}
:::{.my-procedure-header}
:::::: {#prp-chap04-generate-predictions}
: Generating predictions and intervals from the posterior of a fit model
::::::
:::
::::{.my-procedure-container}

> 1. Use link to generate distributions of posterior values for $\mu$. The default behavior of link is to use the original data, so you have to pass it a list of new horizontal axis values you want to plot posterior predictions across. 
> 2. Use summary functions like `base::mean()` or `rethinking::PI()` to find averages and lower and upper bounds of $\mu$ for each value of the predictor variable. 
> 3. Finally, use plotting functions like `graphics::lines()` and `rethinking::shade()` to draw the lines and intervals. Or you might plot the distributions of the predictions, or do further numerical calculations with them. ([McElreath, 2020, p. 107](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=126&annotation=GHV2HBLI))
::::
:::::

To understand the first task of @prp-chap04-generate-predictions McElreath explains several preparation steps. As there are for the whole procedure nine different steps I will separate original and tidyverse approach into @exm-chap04-generating-predictions-and-intervals-m4-3a (Original) and @exm-chap04-generating-predictions-and-intervals-m4-3b (Tidyverse).

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-chap04-generating-predictions-and-intervals-m4-3a}
a: Plotting regression intervals and contours (Original)
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### mean

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-predict-mean-50-m4-3a}
a: Calculate uncertainty around the average regression line at mean of 50kg (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-predict-mean-50-m4-3a

## R code 4.50a ##########################

set.seed(4)                                             # 1
post_m4.3a <- rethinking::extract.samples(m4.3a)        # 2
mu_at_50_a <- post_m4.3a$a + post_m4.3a$b * (50 - xbar) # 3
str(mu_at_50_a)                                         # 4
```

Comments for the different R code lines:

1.  Set seed for exact reproducibility
2.  Repeating the code for drawing (extracting and collecting) from the
    fitted model `m4.3a` (already done in @fig-raw-data-line-m4-3a). `extract.samples()` returns from a `map` object a data.frame containing samples for each parameter in the posterior distribution. These samples are cleaned of dimension attributes and the `lp__`, `dev`, and `log_lik` traces that are used internally. For `map` and other types, it uses the variance-covariance matrix and coefficients to define a multivariate Gaussian posterior to draw $n$ samples from.
3.  The code to the right of the `<-` takes it's form from the equation
    for $\mu_{i} = \alpha + \beta(x_{i} - \overline{x})$. The value of
    $x_{i}$ in this case is $50$.
4.  The result is a vector of predicted means, one for each random
    sample from the posterior. Since joint `a` and `b` went into
    computing each, the variation across those means incorporates the
    uncertainty in and correlation between both parameters.
    [@mcelreath2023a, p.103 and help file from `extract.samples()`]


::::
:::::


###### dens

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-density-m4-3a}
a: The quadratic approximate posterior distribution of the mean height, $\mu$, when weight is $50$ kg. This distribution represents the relative plausibility of different values of the mean (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-density-mean-50-m4-3a
#| fig-cap: "The quadratic approximate posterior distribution of the mean height, μ, when weight is 50 kg. This distribution represents the relative plausibility of different values of the mean (Original)"

## R code 4.51a ##################
rethinking::dens(mu_at_50_a, col = rethinking::rangi2, lwd = 2, xlab = "mu|weight=50")
```

::::
:::::

###### PI

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-PI-mu-at-50-m4-3a}
a: 89% compatibility interval of μ at 50 kg (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: PI-mu-at-50-m4-3a

## R code 4.52a ##############
rethinking::PI(mu_at_50_a, prob = 0.89)
```
::::
:::::

> “What these numbers mean is that the central 89% of the ways for the model to produce the data place the average height between about 159 cm and 160 cm (conditional on the model and data), assuming the weight is 50 kg. 
> That’s good so far, but we need to repeat the above calculation for every weight value on the horizontal axis, not just when it is 50 kg.” ([McElreath, 2020, p. 104](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=123&annotation=9XI839NS))

###### link1

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-link1-m4-3a}
a: Calculate $\mu$ for each case in the data and sample from the posterior distribution (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-link1-m4-3a

## R code 4.53a ##############
mu_352_m4.3a <- rethinking::link(m4.3a)
str(mu_352_m4.3a)
```

::::
:::::

> “You end up with a big matrix of values of $\mu$. Each row is a sample from the posterior distribution. The default is 1000 samples, but you can use as many or as few as you like. Each column is a case (row) in the data. There are 352 rows in `d2_a`, corresponding to 352 individuals. So there are 352 columns in the matrix `mu_m4.3a` above.” ([McElreath, 2020, p. 105](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=124&annotation=DC889A57))

> “The function `link()` provides a posterior distribution of $\mu$ for each case we feed it. So above we have a distribution of $\mu$ for each individual in the original data. We actually want something slightly different: a distribution of $\mu$ for each unique weight value on the horizontal axis. It’s only slightly harder to compute that, by just passing `link()` some new data.” ([McElreath, 2020, p. 105](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=124&annotation=MSEJ4D4N)) (See next tab "link2".)

###### link2

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-link2-m4-3a}
a: Calculate a distribution of $\mu$ for each unique weight value on the horizontal axis (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-link2-m4-3a

## R code 4.54a ###############################
# define sequence of weights to compute predictions for
# these values will be on the horizontal axis
weight.seq <- seq(from = 25, to = 70, by = 1)

# use link to compute mu
# for each sample from posterior
# and for each weight in weight.seq
mu_46_m4.3a <- rethinking::link(m4.3a, data = data.frame(weight = weight.seq))
str(mu_46_m4.3a)
```

::::
:::::

And now there are only 46 columns in $\mu$, because we fed it 46 different
values for weight.

###### plot1

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-dist-100-m4-3a}
a: The first 100 values in the distribution of $\mu$ at each weight value (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-dist-mu-height-100-m4-3a
#| fig-cap: "The first 100 values in the distribution of μ at each weight value. (Original)"

## R code 4.55a ##################
# use type="n" to hide raw data
base::plot(height ~ weight, d2_a, type = "n")

# loop over samples and plot each mu value
for (i in 1:100) {
  graphics::points(weight.seq, 
                   mu_46_m4.3a[i, ], 
                   pch = 16, 
                   col = rethinking::col.alpha(rethinking::rangi2, 0.1))
}

```


::::
:::::
At each weight value in `weight.seq`, a pile of computed $\mu$ values are
shown. Each of these piles is a Gaussian distribution, like that in tab "dens" of
@exm-chap04-generating-predictions-and-intervals-m4-3a. You can see now that the amount of
uncertainty in $\mu$ depends upon the value of `weight`. And this is the
same fact you saw in @exm-chap04-uncertainty-around-mean.

###### sum

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-sum-dist-weight-m4-3a}
a: Summary of the distribution for each weight value (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-sum-dist-weight-m4-3a
#| warning: false
#| results: hold

## R code 4.56a #####################
# summarize the distribution of mu
mu_46.mean_m4.3a <- apply(mu_46_m4.3a, 2, mean)         # 1
mu_46.PI_m4.3a <- apply(mu_46_m4.3a, 2, 
                      rethinking::PI, prob = 0.89)      # 2
str(mu_46_m4.3a)                                        # 3
head(mu_46.mean_m4.3a)                                  # 4
head(mu_46.PI_m4.3a)[ , 1:6]                            # 5
```

1.  Read `apply(mu_46_m4.3a,2,mean)` as "compute the mean of each column
    (dimension '2') of the matrix `mu46_m4.3a`". Now `mu_46.mean_m4.3a` contains the
    average $\mu$ at each weight value.
2.  `mu_46.PI_m4.3a` contains 89% lower and upper bounds for each weight value.
3.  Displaying the structure shows that there are only 46 columns in `mu_46_m4-3a`, because we fed it 46 different values for `weight`.
4.  Display the means of first six columns of the matrix `mu46_m4.3a` (= `mu_46.mean_m4.3a)`
5.  Display the first six 89% PI lower and upper bounds for each `weight` value. 

***
`mu_46.mean_m4.3a` and `mu_46.PI_m4.3a` are just different kinds of summaries of the distributions in `mu_46_m4.3a`, with each column being for a different weight value. These summaries are only summaries. The "estimate" is the entire distribution.


::::
:::::

###### plot2

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-sum-shaded-m4-3a}
a: The !Kung height data with 89% compatibility interval of the mean indicated by the shaded region (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-sum-shaded-m4-3a
#| fig-cap: "The !Kung height data with 89% compatibility interval of the mean indicated by the shaded region. Compare with tab 'dist1'"
#| results: hold

## R code 4.57a ###########################
# plot raw data
# fading out points to make line and interval more visible
plot(height ~ weight, data = d2_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5))

# plot the MAP line, aka the mean mu for each weight
graphics::lines(weight.seq, mu_46.mean_m4.3a)

# plot a shaded region for 89% PI
rethinking::shade(mu_46.PI_m4.3a, weight.seq)

```

::::
:::::

###### link3

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-own-link-m4-3a}
: Writing your own `link()` function (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-own-link-m4-3a
#| results: hold


## R code 4.58a ################################
post_m4.3a <- rethinking::extract.samples(m4.3a)
mu.link_m4.3a <- function(weight) post_m4.3a$a + post_m4.3a$b * (weight - xbar)
weight.seq <- seq(from = 25, to = 70, by = 1)
mu3_m4.3a <- sapply(weight.seq, mu.link_m4.3a)
mu3.mean_m4.3a <- apply(mu3_m4.3a, 2, mean)
mu3.CI_m4.3a <- apply(mu3_m4.3a, 2, rethinking::PI, prob = 0.89)
head(mu3.mean_m4.3a)
head(mu3.CI_m4.3a)[ , 1:6]
```

And the values in `mu3.mean_m4.3a` and `mu3.CI_m4.3a` should be very similar
(allowing for simulation variance) to what you got the automated way,
using `rethinking::link()` in tab 'sum'.
::::
:::::


:::

::::
:::::


What follows now is the tidyverse procedure: Since we used `weight_c` to fit our model, we might first want to understand what exactly the mean value is for weight: `mean(d2_b$weight)` = `r mean(d2_b$weight)`. "Just a hair under 45. If we're interested in $\mu$ at `weight` = 50, that implies we're also interested in $\mu$ at `weight_c` + 5.01. Within the context of our model, we compute this with
$\alpha + \beta \cdot 5.01$." ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#plotting-regression-intervals-and-contours.))

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-chap04-generating-predictions-and-intervals-m4-3b}
b: Plotting regression intervals and contours (Tidyverse)
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### mean

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-predict-mean-50-m4-3b}
a: Calculate uncertainty around the average regression line at mean of 50kg (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-predict-mean-50-m4-3b
#| warning: false

mu_at_50_b <- 
  post_m4.3b |> 
  dplyr::mutate(mu_at_50_b = b_Intercept + b_weight_c * 5.01, .keep = "none")
 
head(mu_at_50_b)
```


::::
:::::


###### dens

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-density-m4-3b}
a: The quadratic approximate posterior distribution of the mean height, $\mu$, when weight is $50$ kg. This distribution represents the relative plausibility of different values of the mean (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-density-m4-3b
#| fig-cap: "The quadratic approximate posterior distribution of the mean height, μ, when weight is 50 kg. This distribution represents the relative plausibility of different values of the mean. Tidyverse version"

mu_at_50_b |>
  ggplot2::ggplot(ggplot2::aes(x = mu_at_50_b)) +
  tidybayes::stat_halfeye(point_interval = tidybayes::mode_hdi, .width = .95,
               fill = "deepskyblue") +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::xlab(expression(mu["height | weight = 50"])) +
  ggplot2::theme_bw()
```
Here we expressed the 95% HPDIs on the density plot with `tidybayes::stat_halfeye()`. Since `tidybayes::stat_halfeye()` also returns a point estimate, we'll throw in the mode.

::::
:::::

###### HPDI

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-PI-mu-at-50-m4-3a}
b: 89% and 95% Highest Priority Intensity Intervals (HPDIs) of $\mu$ at 50 kg (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-PI-mu-at-50-m4-3a

tidybayes::mean_hdi(mu_at_50_b[, 1], .width = c(.89, .95))
```


::::
:::::

###### fitted1

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-fitted1-m4-3b}
b: Calculate $\mu$ for each case in the data and sample from the posterior distribution (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-fitted1-m4-3b

mu2_m4.3b <- brms:::fitted.brmsfit(m4.3b, summary = F)
str(mu2_m4.3b)
```

***

With {**brms**} the equivalence for `rethinking::link()` (tab "link1" in @exm-chap04-generating-predictions-and-intervals-m4-3a) is the `fitted()` function.

With `brms:::fitted.brmsfit()`, it's quite easy to plot a regression
line and its intervals. Just omit the `summary = T` argument.


> "When you specify `summary = F`, `brms:::fitted.brmsfit()` returns a
matrix of values with as many rows as there were post-warmup draws
across your Hamilton Monte Carlo (`r glossary("HMC")`) chains and as many columns as
there were cases in your analysis. Because we had 4,000 post-warmup
draws and $n=352$, `brms:::fitted.brmsfit()` returned a matrix of 4,000
rows and 352 vectors. If you omitted the `summary = F` argument, the
default is TRUE and `brms:::fitted.brmsfit()` will return summary
information instead." ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#plotting-regression-intervals-and-contours.))

::::
:::::


:::::{.my-watch-out}
:::{.my-watch-out-header}
WATCH OUT! How to apply `fitted()`?
:::
::::{.my-watch-out-container}
Kurz applies the function `fitted()` in the code, but in the text he
uses twice `brms::fitted()` which doesn't exist. I used
`brms:::fitted.brmsfit()`. 

But with `stats::fitted()` you will get the same result!

The object `m4.3b` is of class `brmsfit` but in the help file of
`stats::fitted()` you can read: "`fitted` is a generic function which
extracts fitted values from objects returned by modeling functions.
**All object classes which are returned by model fitting functions
should provide a `fitted` method.** ([Extract Model Fitted Values](https://stat.ethz.ch/R-manual/R-devel/library/stats/html/fitted.values.html), emphasis is mine)

My interpretation therefore is that `stats::fitted()` is using
`brms:::fitted.brmsfit()`. That's why the results are identical.
::::
:::::

###### fitted2

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-fitted2-m4-3b}
b: Calculate a distribution of $\mu$ for each unique weight value on the horizontal axis (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-fitted2-m4-3b

weight_seq <- 
  tibble::tibble(weight = 25:70) |> 
  dplyr::mutate(weight_c = weight - base::mean(d2_b$weight))

mu3_m4.3b <-
  brms:::fitted.brmsfit(m4.3b,
         summary = F,
         newdata = weight_seq) |>
  tibble::as_tibble() |>
  # here we name the columns after the `weight` values from which they were computed
  rlang::set_names(25:70) |> 
  dplyr::mutate(iter = 1:dplyr::n())

mu3_m4.3b[1:6, 1:6]
```

***

Much like `rethinking::link()`, `brms:::fitted.brmsfit()` can
accommodate custom predictor values with its `newdata` argument.

To differentiate: The {rethinking} version used the variable `weight.seq` whereas here I am using `weight_seq`.

::::
:::::

###### plot1

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-dist-100-m4-3b}
b: The first 100 values in the distribution of $\mu$ at each weight value (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-dist-100-m4-3b
#| fig-cap: "The first 100 values in the distribution of μ at each weight value (Tidyverse)"

mu4_m4.3b <- 
  mu3_m4.3b |>
  tidyr::pivot_longer(-iter,
               names_to = "weight",
               values_to = "height") |> 
  # we might reformat `weight` to numerals
  dplyr::mutate(weight = base::as.numeric(weight))

d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight, y = height)) +
  ggplot2::geom_point(data = mu4_m4.3b |> dplyr::filter(iter < 101), 
             color = "navyblue", alpha = .05) +
  ggplot2::coord_cartesian(xlim = c(30, 65)) +
  ggplot2::theme_bw()


```
::::
:::::

But did a little more data processing with the aid
of `tidyr::pivot_longer()`, which will convert the data from the wide
format to the long format.

:::::{.my-resource}
:::{.my-resource-header}
Wide and long data
:::
::::{.my-resource-container}
If you are new to the distinction between wide and long data, you can
learn more from the 

- [Pivot data from wide to
long](https://tidyr.tidyverse.org/reference/pivot_longer.html) vignette
from the tidyverse team (2020); 
- Simon Ejdemyr's blog post, [Wide & long
data](https://sejdemyr.github.io/r-tutorials/basics/wide-and-long/) or
- Karen Grace-Martin's blog post, [The wide and long data format for
repeated measures
data](https://www.theanalysisfactor.com/wide-and-long-data/).
::::
:::::

###### fitted3

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-fitted3-mean-m4-3b}
b: Draws predicted *mean* response values from the posterior predictive distribution (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-fitted3-mean-m4-3b

mu_summary_m4.3b <-
  brms:::fitted.brmsfit(m4.3b, 
         newdata = weight_seq,
         probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq)

set.seed(4)
mu_summary_m4.3b |> 
  dplyr::slice_sample(n = 6)
```


:::::{.my-note}
:::{.my-note-header}
How to change the {**brms**) default CI value form 95% to 89%?
:::
::::{.my-note-container}

In {**brms**} I had to change the `probs` argument to `c(0.055, 0.945)` and the resulting third and fourth vectors from the `fitted()` object to `Q5.5` and `Q94.5`. This was necessary to get the same 89% interval as in the book version. The default `probs` value for `brms:::fitted.brmsfit()` would have been `c(0.025, 0.975)` resulting in quantiles of `Q2.5` and `Q97.5`. The Q prefix stands for quantile. See [Rename summary columns of predict() and related methods](https://github.com/paul-buerkner/brms/issues/425).

::::
:::::


::::
:::::

###### plot2

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-sum-shaded-m4-3b}
a: The !Kung height data with 89% compatibility interval of the mean indicated by the shaded region (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-sum-shaded-m4-3b
#| fig-cap: "Plot of the summaries on top of the !Kung height data again, now with 89% compatibility interval of the mean indicated by the shaded region. Compare this region to the distributions of blue points in tab 'plot1'"


d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight, y = height)) +
  ggplot2::geom_smooth(data = mu_summary_m4.3b,
              ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
              stat = "identity",
              fill = "grey70", color = "black", alpha = 1, linewidth = 1/2) +
  ggplot2::geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  ggplot2::coord_cartesian(xlim = range(d2_b$weight)) +
  ggplot2::theme_bw()
```

::::
:::::

If you wanted to use intervals other than the default 89% ones, you'd
include the `probs` argument like this:
`brms:::fitted.brmsfit(b4.3, newdata = weight.seq, probs = c(.25, .75))`.
The resulting third and fourth vectors from the `fitted()` object would
be named `Q25` and `Q75` instead of the default `Q5.5` and `Q94.5`. The
[Q prefix](https://github.com/paul-buerkner/brms/issues/425) stands for
quantile.

Similar to `rethinking::link()`, `brms:::fitted.brmsfit()` uses the
formula from your model to compute the model expectations for a given
set of predictor values. I used it a lot in this project. If you follow
along, you'll get a good handle on it. 

:::::{.my-resource}
:::{.my-resource-header}
How to use `brms:::fitted.brmsfit()`
:::
::::{.my-resource-container}

To dive deeper about the `fitted()` function, you can [go for the
documentation](https://rdrr.io/cran/brms/man/fitted.brmsfit.html).
Though Kurz won't be using it in this project, {**brms**} he informs users that `fitted()` is also an alias for the `brms::posterior_epred()` function, about which you might [learn more here](https://rdrr.io/cran/brms/man/posterior_epred.brmsfit.html). Users
can always learn more about them and other functions in the [{**brms**}
reference manual](https://cran.r-project.org/web/packages/brms/brms.pdf).

::::
:::::


:::

::::
:::::


#### Prediction intervals

> “What you’ve done so far is just use samples from the posterior to visualize the uncertainty in $μ_{i}$, the linear model of the mean.” ([McElreath, 2020, p. 107](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=126&annotation=DK5AQD7T))

> “Now let’s walk through generating an 89% prediction interval for actual heights, not just the average height, $\mu$. This means we’ll incorporate the standard deviation $\sigma$ and its uncertainty as well.” ([McElreath, 2020, p. 107](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=126&annotation=YPK9QEW5))


::: my-procedure
::: my-procedure-header
::: {#prp-chap04-sim-heights-m3-4}
: Simulating heights
:::
:::

::: my-procedure-container
1.  For any unique weight value, you sample from a Gaussian distribution
    -   with the correct mean $\mu$ for that weight,
    -   using the correct value of $\sigma$ sampled from the posterior
        distribution.
2.  Do this
    -   for every sample from the posterior,
    -   for every weight value.
3.  You will end up with a collection of simulated heights
    -   that embody the uncertainty in the posterior
    -   as well as the uncertainty in the Gaussian distribution of
        heights.
:::
:::

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-chap04-predict-intervals}
: Generate an 89% prediction interval for actual heights
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### sim1 (O)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-sim-post-m4-3a}
a: Simulation of the posterior observations (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-sim-post-m4-3a

## R code 4.59a #######################
sim.height_m4.3a <- rethinking::sim(m4.3a, data = list(weight = weight.seq))
str(sim.height_m4.3a)
```

This matrix is much like the earlier one in tab 'sum' of @exm-chap04-generating-predictions-and-intervals-m4-3a, but it contains
simulated heights, not distributions of plausible average height, $\mu$.


::::
:::::


###### PI (O)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-sum-pi-heights-m4-3a}
a: Summarize simulated heights of the 89% posterior prediction interval of
observable heights (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-sum-pi-heights-m4-3a

## R code 4.60a ###################
height.PI_m4.3a <- apply(sim.height_m4.3a, 2, rethinking::PI, prob = 0.89)
```


`height.PI_m4-3a` contains the 89% posterior prediction interval of
observable (according to the model) heights, across the values of weight
in `weight.seq`.

::::
:::::

###### plot1 (0)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-plot-heights-m4-3a}
a: 89% prediction interval for height, as a function of weight by sampling 1e3 times (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-plot-heights-m4-3a
#| fig-cap: "89% prediction interval for height, as a function of weight. The solid line is the average line for the mean height at each weight. The two shaded regions show different 89% plausible regions. The narrow shaded interval around the line is the distribution of μ. The wider shaded region represents the region within which the model expects to find 89% of actual heights in the population, at each weight."
#| results: hold

## R code 4.61a ##################
# plot raw data
plot(height ~ weight, d2_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5))

# draw MAP line
graphics::lines(weight.seq, mu_46.mean_m4.3a)

# draw PI region for line
# rethinking::shade(mu_46.PI_m4.3a, weight.seq)

# draw HPDI region for line
mu_46.HPDI_m4.3a <- apply(mu_46_m4.3a, 2, 
                      rethinking::HPDI, prob = .89)
rethinking::shade(mu_46.HPDI_m4.3a, weight.seq)

# draw PI region for simulated heights
rethinking::shade(height.PI_m4.3a, weight.seq)


```

> “Notice that the outline for the wide shaded interval is a little rough. This is the simulation variance in the tails of the sampled Gaussian values. If it really bothers you, increase the number of samples you take from the posterior distribution. The optional $n$ parameter for `sim.height_m4.3a` controls how many samples are used.” ([McElreath, 2020, p. 109](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=128&annotation=69UVQJZI))

Try for example $1e4$ samples. 

> “Run the plotting code again, and you’ll see the shaded boundary smooth out some.” ([McElreath, 2020, p. 109](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=128&annotation=ACYS7GCU)) 

See the next tab 'plot2'.

::::
:::::

###### plot2 (0)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-plot-heights-m4-3a}
a: 89% prediction interval for height, as a function of weight by sampling 1e4 times (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-plot-heights2-m4-3a
#| fig-cap: "89% prediction interval for height, as a function of weight. Shaded boundary smoothed out by sampling 1e4 times instead of the standard value of 1e3"

## R code 4.59a adapted ################
sim2.height_m4.3a <- rethinking::sim(m4.3a, data = list(weight = weight.seq), n = 1e4)

## R code 4.60a adapted ################
height2.PI_m4.3a <- apply(sim2.height_m4.3a, 2, rethinking::PI, prob = 0.89)

## R code 4.61a ##################

# plot raw data
plot(height ~ weight, d2_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5))

mu_46.HPDI_m4.3a <- apply(mu_46_m4.3a, 2,
                      rethinking::HPDI, prob = 0.89)

# draw MAP line
graphics::lines(weight.seq, mu_46.mean_m4.3a)

# draw HPDI region for line
rethinking::shade(mu_46.HPDI_m4.3a, weight.seq)

# draw PI region for simulated heights
rethinking::shade(height2.PI_m4.3a, weight.seq)
```
::::
:::::

> “With extreme percentiles, it can be very hard to get out all of the roughness. Luckily, it hardly matters, except for aesthetics. Moreover, it serves to remind us that all statistical inference is approximate. The fact that we can compute an expected value to the 10th decimal place does not imply that our inferences are precise to the 10th decimal place” ([McElreath, 2020, p. 109](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=128&annotation=NZMP7RDP))

###### sim2 (0)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-sim2-heights-m4-3a}
a: Writing you own `rethinking:sim()` function
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: writing-sim-function-a
#| results: hold

## R code 4.63a ########################################

# post <- extract.samples(m4.3)
# weight.seq <- 25:70
post_m4.3a <- rethinking::extract.samples(m4.3a)
sim.height2_m4.3a <- sapply(weight.seq, function(weight) {
  rnorm(
    n = nrow(post_m4.3a),
    mean = post_m4.3a$a + post_m4.3a$b * (weight - xbar),
    sd = post_m4.3a$sigma
  )
})
height2.PI_m4.3a <- apply(sim.height2_m4.3a, 2, rethinking::PI, prob = 0.89)
head(height.PI_m4.3a)[ , 1:6]
head(height2.PI_m4.3a)[ , 1:6]
```

::::
:::::

The small differences are the result of the randomized sampling process.


###### predict1 (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-predict-post-m4-3b}
b: Predict the posterior observations (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-predict-post-m4-3b

pred_height_m4.3b <-
  brms:::predict.brmsfit(m4.3b,
          newdata = weight_seq,
          probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq)

set.seed(4)
pred_height_m4.3b |>
  dplyr::slice_sample(n = 6)
```

The `predict()` code looks a lot like what we used for the tab in fitted3 of @exm-chap04-generating-predictions-and-intervals-m4-3b.


::::
:::::

:::::{.my-important}
:::{.my-important-header}
{**brms**} equivalence of `rethinking::link()` and `rethinking::sim()`
:::
::::{.my-important-container}
Much as `brms:::fitted.brmsfit()` was our analogue to
`rethinking::link()`, `brms:::predict.brmsfit()` is our analogue to
`rethinking::sim()`. ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#prediction-intervals.))

***

- **link**: Computes the value of each linear model at each sample for each case in the data.
- **sim**: Uses the model definition from a `map` or `map2stan` fit to simulate outcomes that average over the posterior distribution. 
- **fitted.brmsfit**: Compute posterior draws of the expected value of the posterior predictive distribution. Returns an array of predicted *mean* response values.
- **predict.brmsfit**: Compute posterior draws of the posterior predictive distribution. Returns an array of predicted response values. 


::::
:::::

###### plot1 (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-predict-plot1-m4-3b}
b: Plot 89% prediction interval for height, as a function of weight with 2e3 iterations (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-predict-plot1-m4-3b
#| fig-cap: "89% prediction interval for height, as a function of weight. The solid line is the average line for the mean height at each weight. The two shaded regions show different 89% plausible regions. The narrow shaded interval around the line is the distribution of μ. The wider shaded region represents the region within which the model expects to find 89% of actual heights in the population, at each weight. (Tidyverse)"

d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight)) +
  ggplot2::geom_ribbon(data = pred_height_m4.3b, 
              ggplot2::aes(ymin = Q5.5, ymax = Q94.5),
              fill = "grey83") +
  ggplot2::geom_smooth(data = mu_summary_m4.3b,
              ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
              stat = "identity",
              fill = "grey70", color = "black", alpha = 1, linewidth = 1/2) +
  ggplot2::geom_point(ggplot2::aes(y = height),
             color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight),
                  ylim = base::range(d2_b$height)) +
  ggplot2::theme_bw()
```
To smooth out the rough shaded interval we would have in the {**brms**}
model fitting approach to refit `m4.3b` into `m4.3b_smooth` after specifying a larger number of post-warmup iterations with alterations to the `iter` and `warmup`
parameters.

::::
:::::

###### plot2 (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-predict-plot2-m4-3b}
b: Plot 89% prediction interval for height, as a function of weight with 2e4 iterations (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-predict-plot2-m4-3b
#| fig-cap: "89% prediction interval for height, as a function of weight. The solid line is the average line for the mean height at each weight. The two shaded regions show different 89% plausible regions. The narrow shaded interval around the line is the distribution of μ. The wider shaded region represents the region within which the model expects to find 89% of actual heights in the population, at each weight. (Tidyverse)"
#| cache: true

m4.3b_smooth <- 
  brms::brm(data = d2_b, 
      family = gaussian,
      height ~ 1 + weight_c,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b, lb = 0),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 20000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.03b_smooth")

mu_summary_m4.3b_smooth <-
  brms:::fitted.brmsfit(m4.3b_smooth, 
         newdata = weight_seq,
         probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq)

pred_height_m4.3b_smooth <-
  brms:::predict.brmsfit(m4.3b_smooth,
          newdata = weight_seq,
          probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq)


d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight)) +
  ggplot2::geom_ribbon(data = pred_height_m4.3b_smooth, 
              ggplot2::aes(ymin = Q5.5, ymax = Q94.5),
              fill = "grey83") +
  ggplot2::geom_smooth(data = mu_summary_m4.3b_smooth,
              ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
              stat = "identity",
              fill = "grey70", color = "black", alpha = 1, linewidth = 1/2) +
  ggplot2::geom_point(ggplot2::aes(y = height),
             color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight),
                  ylim = base::range(d2_b$height)) +
  ggplot2::theme_bw()
```


::::
:::::

This is the same graphic as in tab 'plot1' but with a factor 10 more iterations to smooth out the rough shaded interval.

###### predict2 (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-predict2-manually-m4-3b}
b: Write you own predict for the posterior observations: Model-based predictions without {**brms**} and `predict()`: mean with quantiles of 0.055 and .945 (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-predict2-manually-m4-3b
#| fig-cap: "Model-based predictions without {brms} and pedict(): mean with quantiles of 0.055 and 0.945"

set.seed(4)

post_m4.3b |> 
  tidyr::expand_grid(weight = 25:70) |> 
  dplyr::mutate(weight_c = weight - base::mean(d2_b$weight)) |> 
  dplyr::mutate(sim_height = stats::rnorm(dplyr::n(),
                            mean = b_Intercept + b_weight_c * weight_c,
                            sd   = sigma)) |> 
  dplyr::group_by(weight) |> 
  dplyr::summarise(mean = base::mean(sim_height),
            ll   = stats::quantile(sim_height, prob = .055),
            ul   = stats::quantile(sim_height, prob = .945)) |> 
  
  # plot
  ggplot2::ggplot(ggplot2::aes(x = weight)) +
  ggplot2::geom_smooth(ggplot2::aes(y = mean, ymin = ll, ymax = ul),
              stat = "identity",
              fill = "grey83", color = "black", alpha = 1, linewidth = 1/2) +
  ggplot2::geom_point(data = d2_b,
             ggplot2::aes(y = height),
             color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight),
                           ylim = base::range(d2_b$height)) +
  ggplot2::theme_bw()
```
Here we followed McElreath's example and calculated our model-based predictions "by
hand". Instead of relying on base R `apply()` and `sapply()`, the main
action in the {**tidyverse**} approach is in `tidyr::expand_grid()`, the second
`dplyr::mutate()` line with `stats:rnomr()` and the `dplyr::group_by()` + `dplyr::summarise()` combination.

We specifically left out the `fitted()` intervals to make it more
apparent what we were simulating. 

::::
:::::

###### predict3 (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-predict3-manually-m4-3b}
b: Write you own predict for the posterior observations, now using {**tidybayes**} HDI of 89%: mode with quantiles of 0.055 and 0.945
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-predict3-manually-m4-3b
#| fig-cap: "Model-based predictions without {brms} and predict(): now using mode of HDI of 89% from {**tidybayes**}: mean with quantiles of 0.055 and .945"


set.seed(4)

post_m4.3b |> 
  tidyr::expand_grid(weight = 25:70) |> 
  dplyr::mutate(weight_c = weight - base::mean(d2_b$weight)) |> 
  dplyr::mutate(sim_height = stats::rnorm(dplyr::n(),
                            mean = b_Intercept + b_weight_c * weight_c,
                            sd   = sigma)) |> 
  # dplyr::group_by(weight) |> 
  # dplyr::summarise(mean = base::mean(sim_height),
  #           ll   = stats::quantile(sim_height, prob = .055),
  #           ul   = stats::quantile(sim_height, prob = .945)) |> 
  
  dplyr::group_by(weight) |> 
  tidybayes::mode_hdi(sim_height, .width = .89) |> 
  
  # plot
  ggplot2::ggplot(ggplot2::aes(x = weight)) +
  ## instead of "aes(y = mean, ymin = ll, ymax = ul)"
  ggplot2::geom_smooth(ggplot2::aes(y = .point, ymin = .lower, ymax = .upper),
              stat = "identity",
              fill = "grey83", color = "black", alpha = 1, linewidth = 1/2) +
  ggplot2::geom_point(data = d2_b,
             ggplot2::aes(y = height),
             color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  ggplot2::coord_cartesian(xlim = base::range(d2_b$weight),
                           ylim = base::range(d2_b$height)) +
  ggplot2::labs(y = "mode") +
  ggplot2::theme_bw()
```
Here we replaced that three-line summarize() code with a single line of
`tidybayes::mean_qi(sim_height)`, or whatever combination of central
tendency and interval type you wanted (here we used: `tidybayes::mode_hdi(sim_height, .width = .89)`).


::::
:::::


:::

::::
:::::


## Curves from lines

> “We’ll consider two commonplace methods that use linear regression to build curves. The first is `r glossary("polynomial regression")`. The second is `r glossary("spline", "b-splines")`.” ([McElreath, 2020, p. 110](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=129&annotation=TDHV7FFZ))

### Polynomial regression

> “Polynomial regression uses powers of a variable—squares and cubes—as extra predictors.” ([McElreath, 2020, p. 110](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=129&annotation=NBTWLNYX))


#### Looking at the full !Kung data (Scatterplot)

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-fig-scatterplot-height-weight}
: Scatterplot of height against weight
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### Original

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-scatterplot-height-weight-a}
a: Height in centimeters (vertical) plotted against weight in kilograms (horizontal) (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-scatterplot-height-weight-a
#| fig-cap: "Height in centimeters (vertical) plotted against weight in kilograms (horizontal). This time with the full data, e.g., the non-adults data."
#| attr-source: '#fig-scatterplot-height-weight-a lst-cap="Height in centimeters (vertical) plotted against weight in kilograms (horizontal): rethinking version"'

plot(height ~ weight, data = d_a, col = rethinking::rangi2)
```

::::
:::::


###### Tidyverse

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-scatterplot-height-weight-b}
b: Height in centimeters (vertical) plotted against weight in kilograms (horizontal) (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-scatterplot-height-weight-b
#| fig-cap: "Height in centimeters (vertical) plotted against weight in kilograms (horizontal). This time with the full data, e.g., the non-adults data. (Tidyverse)"

d_b |> 
  ggplot2::ggplot(ggplot2::aes(x = weight, y = height)) +
  ggplot2::geom_point(color = "navyblue", shape = 1, size = 1.5, alpha = 2/3) +
  ggplot2::annotate(geom = "text",
           x = 42, y = 115,
           label = "This relation is\nvisibly curved.",
           family = "Times") +
  ggplot2::theme_bw()
```

::::
:::::

:::

The relationship is visibly curved, now that we've included the
non-adult individuals. (Compare with adult data in @fig-raw-data-line-m4-3a).

::::
:::::


:::::{.my-note}
:::{.my-note-header}
What is a quadratic polynomial?
:::
::::{.my-note-container}
> A quadratic function (also called a quadratic, a quadratic polynomial,
> or a polynomial of degree 2) is special type of polynomial function
> where the highest-degree term is second degree. ... The graph of a
> quadratic function is a parabola, a 2-dimensional curve that looks
> like either a cup(∪) or a cap(∩). ([Statistis How
> To](https://www.statisticshowto.com/quadratic-function/))
::::
:::::


:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-ID-text}
: Parabolic and cubic equation for the mean height
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### Parabolic


:::::{.my-theorem}
:::{.my-theorem-header}
:::::: {#thm-parabolic-mean}
: Parabolic equation for the mean height
::::::
:::
::::{.my-theorem-container}

> “The most common polynomial regression is a parabolic model of the mean. Let x be standardized body weight.” ([McElreath, 2020, p. 110](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=129&annotation=SKF6GCYX))


$$
\mu_{i} = \alpha + \beta_{1}x_{i} + \beta_{2}x_{i}^2
$$ {#eq-parabolic-mean}

> “The above is a parabolic (second order) polynomial. The $\alpha + \beta_{1}x_{i}$ part is the same linear function of $x$ in a linear regression, just with a little '1' subscript added to the parameter name, so we can tell it apart from the new parameter. The additional term uses the square of $x_{i}$ to construct a parabola, rather than a perfectly straight line. The new parameter $\beta_{2}$ measures the curvature of the relationship.” ([McElreath, 2020, p. 110](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=129&annotation=S6I93I6F))


$$
\begin{align*}
h_{i} \sim \operatorname{Normal}(μ_{i}, σ) \space \space (1) \\ 
μ_{i} = \alpha + \beta_{1}x_{i} + \beta_{2}x_{i}^2 \space \space (2) \\
\alpha \sim \operatorname{Normal}(178, 20) \space \space (3)  \\ 
\beta_{1} \sim \operatorname{Log-Normal}(0,10) \space \space (4) \\
\beta_{2} \sim \operatorname{Normal}(0,10) \space \space (5) \\
\sigma \sim \operatorname{Uniform}(0, 50) \space \space (6)      
\end{align*}
$$ {#eq-parabolic-model}

```         
height ~ dnorm(mu, sigma)                  # (1)
mu <- a + b1 * weight.s + b2 * weight.s2^2 # (2)
a ~ dnorm(178, 20)                         # (3)
b1 ~ dlnorm(0, 10)                         # (4)
b2 ~ dnorm(0, 10)                          # (5)
sigma ~ dunif(0, 50)                       # (6)
```

> “The confusing issue here is assigning a prior for $\beta_{2}$, the parameter on the squared value of $x$. Unlike $\beta_{1}$, we don’t want a positive constraint.” ([McElreath, 2020, p. 111](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=130&annotation=6CQMMGIN))


::::
:::::


###### Cubic


:::::{.my-theorem}
:::{.my-theorem-header}
:::::: {#thm-cubic-mean}
: Cubic equation for the mean height
::::::
:::
::::{.my-theorem-container}


$$
\begin{align*}
h_{i} \sim \operatorname{Normal}(μ_{i}, σ) \space \space (1) \\ 
μ_{i} = \alpha + \beta_{1}x_{i} + \beta_{2}x_{i}^2 + \beta_{3}x_{i}^3 \space \space (2) \\
\alpha \sim \operatorname{Normal}(178, 20) \space \space (3)  \\ 
\beta_{1} \sim \operatorname{Log-Normal}(0,10) \space \space (4) \\
\beta_{2} \sim \operatorname{Normal}(0,10) \space \space (5) \\
\beta_{3} \sim \operatorname{Normal}(0,10) \space \space (6) \\
\sigma \sim \operatorname{Uniform}(0, 50) \space \space (7)      
\end{align*}
$$ {#eq-cubic-model}

```         
height ~ dnorm(mu, sigma)                                      # (1)
mu <- a + b1 * weight.s +  b2 * weight.s2^2 + b3 * weight.s3^3 # (2)
a ~ dnorm(178, 20)                                             # (3)
b1 ~ dlnorm(0, 10)                                             # (4)
b2 ~ dnorm(0, 10)                                              # (5)
b3 ~ dnorm(0, 10)                                              # (6)
sigma ~ dunif(0, 50)                                           # (7)
```
::::
:::::

:::

::::
:::::


#### Standardizing the predictor variable

> “The first thing to do is to `r glossary("Standardization", "standardize")` the `r glossary("predictor variable")`. We’ve done this in previous examples. But this is especially helpful for working with polynomial models. When predictor variables have very large values in them, there are sometimes numerical glitches. Even well-known statistical software can suffer from these glitches, leading to mistaken estimates. These problems are very common for polynomial regression, because the square or cube of a large number can be truly massive. Standardizing largely resolves this issue. It should be your default behavior.” ([McElreath, 2020, p. 111](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=130&annotation=MY2LJCW5))


:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-find-post-parabolic}
: Posterior distribution of a parabolic model of height on weight
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### Original

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-quap-parabolic-m4-5a}
a: Finding the posterior distribution of a parabolic model of height on weight (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: quap-parabolic-m4-5a

## R code 4.65a #######################

## standardization ########
d_a$weight_s <- (d_a$weight - mean(d_a$weight)) / sd(d_a$weight)
d_a$weight_s2 <- d_a$weight_s^2

## find posterior distribution ########
m4.5a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b1 * weight_s + b2 * weight_s2,
    a ~ dnorm(178, 20),
    b1 ~ dlnorm(0, 1),
    b2 ~ dnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = d_a
)

## R code 4.66a ################
## precis results ###########
rethinking::precis(m4.5a)
```

**$\alpha$ (a)**: Intercept. Expected value of `height` when `weight` is at its mean value. 

> “But it is no longer equal to the mean height in the sample, since there is no guarantee it should in a polynomial regression.” ([McElreath, 2020, p. 112](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=131&annotation=2X62MWHB))

> “The implied definition of α in a parabolic model is $\alpha = Ey_{i} − \beta_{1}Ex_{i} − \beta_{2} Ex_{i}^2$. Now even when the average $x_{i}$ is zero, $Ex_{i} = 0$, the average square will likely not be zero. So $\alpha$ becomes hard to directly interpret again.” ([McElreath, 2020, p. 562](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=581&annotation=Q4D6XGEU))

**$\beta_{1}$ and $\beta_{2}$ (b1 and b2)**: Linear and square components of the curve. But that doesn’t make them transparent.

::::
:::::

> “Now, since the relationship between the outcome height and the predictor weight depends upon two slopes, $\beta_{1}$ and $\beta_{2}$, it isn’t so easy to read the relationship off a table of coefficients.” ([McElreath, 2020, p. 111](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=130&annotation=T7DM6G2P))

> “You have to plot these model fits to understand what they are saying.” ([McElreath, 2020, p. 112](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=131&annotation=6RLW99RK))


###### Tidyverse

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-brm-parabolic-m4-5b}
b: Finding the posterior distribution of a parabolic model of height on weight (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: brm-parabolic-m4-5b
#| warning: false
#| cache: true

## R code 4.65b #######################
## standardization ###########
d3_m4.5b <-
  d_b |>
  dplyr::mutate(weight_s = (weight - base::mean(weight)) / stats::sd(weight)) |> 
  dplyr::mutate(weight_s2 = weight_s^2)

## fitting model ############
m4.5b <- 
  brms::brm(data = d3_m4.5b, 
      family = gaussian,
      height ~ 1 + weight_s + weight_s2,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b, coef = "weight_s"),
                brms::prior(normal(0, 1), class = b, coef = "weight_s2"),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.05b")

## R code 4.66a ################
## print.brmsfit results ###############
brms:::print.brmsfit(m4.5b)
```

Note our use of the coef argument within our prior statements. Since $\beta_{1}$
and $\beta_{2}$ are both parameters of `class = b` within the {**brms**} set-up,
we need to use the `coef` argument when we want their priors to differ.

::::
:::::

###### Trace plot

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-trace-plot-m4-5b}
: Display trace plot (Tidyverse)
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: trace-plot-m4-5b

brms:::plot.brmsfit(m4.5b, widths = c(1, 2))
```

::::
:::::


:::

::::
:::::


#### Fit linear, parabolic and cubic model


:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-fig-different-regressions}
: Different regressions (linear, parabolic and cubic) of height on weight (standardized), for the full !Kung data
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### Linear (O)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-linear-full-data-m4-4a}
a: Posterior distribution of the linear height-weight model with the full !Kung data set (Original)
::::::
:::
::::{.my-r-code-container}


```{r}
#| label: fig-linear-full-data-m4-4a
#| fig-cap: "Posterior distribution of the linear height-weight model with the full !Kung data set (Original)"


## R code 4.42a #############################
# define the average weight, x-bar
xbar_m4.4a <- mean(d_a$weight_s)

# fit model
m4.4a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b * (weight_s - xbar_m4.4a),
    a ~ dnorm(178, 20),
    b ~ dlnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = d_a
)


## R code 4.46a ############################################
plot(height ~ weight_s, data = d_a, col = rethinking::rangi2)
post_m4.4a <- rethinking::extract.samples(m4.4a)
a_map_m4.4a <- mean(post_m4.4a$a)
b_map_m4.4a <- mean(post_m4.4a$b)
curve(a_map_m4.4a + b_map_m4.4a * (x - xbar_m4.4a), add = TRUE)

## R code 4.58a ################################
mu.link_m4.4a <- function(weight) post_m4.4a$a + post_m4.4a$b * (weight - xbar_m4.4a)
weight.seq_m4.4a <- seq(from = -3, to = 3, by = 1)
mu_m4.4a <- sapply(weight.seq_m4.4a, mu.link_m4.4a)
mu.mean_m4.4a <- apply(mu_m4.4a, 2, mean)
mu.CI_m4.4a <- apply(mu_m4.4a, 2, rethinking::PI, prob = 0.89)


## R code 4.59a #######################
sim.height_m4.4a <- rethinking::sim(m4.4a, data = list(weight_s = weight.seq_m4.4a))

## R code 4.60a ###################
height.PI_m4.4a <- apply(sim.height_m4.4a, 2, rethinking::PI, prob = 0.89)

## R code 4.61a ##################
# plot raw data
# plot(height ~ weight, d_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5))

# draw MAP line
graphics::lines(weight.seq_m4.4a, mu.mean_m4.4a)

# draw HPDI region for line
rethinking::shade(mu.CI_m4.4a, weight.seq_m4.4a)

# draw PI region for simulated heights
rethinking::shade(height.PI_m4.4a, weight.seq_m4.4a)


```


::::
:::::

> “[The graphic] shows the familiar linear regression from earlier in the chapter, but now with the standardized predictor and full data with both adults and non-adults. The linear model makes some spectacularly poor predictions, at both very low and middle weights.” ([McElreath, 2020, p. 113](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=132&annotation=6FFX8KKA))

###### Parabolic (O)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-parabolic-regression-m4-5a}
a: Parabolic regression of height on weight (standardized), for the full !Kung data (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-parabolic-regression-m4-5a
#| fig-cap: "A second order polynomial, a parabolic or quadratic regression of height on weight (standardized), for the full !Kung data (Original)"

## R code 4.67a ################
weight.seq_m4.5a <- seq(from = -2.2, to = 2, length.out = 30)
pred_dat_m4.5a <- list(weight_s = weight.seq_m4.5a, weight_s2 = weight.seq_m4.5a^2)
mu_m4.5a <- rethinking::link(m4.5a, data = pred_dat_m4.5a)
mu.mean_m4.5a <- apply(mu_m4.5a, 2, mean)
mu.PI_m4.5a <- apply(mu_m4.5a, 2, rethinking::PI, prob = 0.89)
sim.height_m4.5a <- rethinking::sim(m4.5a, data = pred_dat_m4.5a)
height.PI_m4.5a <- apply(sim.height_m4.5a, 2, rethinking::PI, prob = 0.89)


## R code 4.68a #################
plot(height ~ weight_s, d_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5))
graphics::lines(weight.seq_m4.5a, mu.mean_m4.5a)
rethinking::shade(mu.PI_m4.5a, weight.seq_m4.5a)
rethinking::shade(height.PI_m4.5a, weight.seq_m4.5a)

```

::::
:::::

The quadratic regression does a pretty good job. It is much better than
a linear regression for the full `Howell1` data set.

###### Cubic (O)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-cubic-regression-m4-6a}
a: Cubic regressions of height on weight (standardized), for the full !Kung data (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-cubic-regression-m4-6a
#| fig-cap: "A third order polynomial, a cubic regression of height on weight (standardized), for the full !Kung data (Original)"

## R code 4.69a ####################
d_a$weight_s3 <- d_a$weight_s^3
m4.6a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + b1 * weight_s + b2 * weight_s2 + b3 * weight_s3,
    a ~ dnorm(178, 20),
    b1 ~ dlnorm(0, 1),
    b2 ~ dnorm(0, 10),
    b3 ~ dnorm(0, 10),
    sigma ~ dunif(0, 50)
  ),
  data = d_a
)

## R code 4.67a ################
weight.seq_m4.6a <- seq(from = -2.2, to = 2, length.out = 30)
pred_dat_m4.6a <- list(weight_s = weight.seq_m4.6a, weight_s2 = weight.seq_m4.6a^2,
                    weight_s3 = weight.seq_m4.6a^3)
mu_m4.6a <- rethinking::link(m4.6a, data = pred_dat_m4.6a)
mu.mean_m4.6a <- apply(mu_m4.6a, 2, mean)
mu.PI_m4.6a <- apply(mu_m4.6a, 2, rethinking::PI, prob = 0.89)
sim.height_m4.6a <- rethinking::sim(m4.6a, data = pred_dat_m4.6a)
height.PI_m4.6a <- apply(sim.height_m4.6a, 2, rethinking::PI, prob = 0.89)


## R code 4.68a #################
plot(height ~ weight_s, d_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5))
graphics::lines(weight.seq_m4.6a, mu.mean_m4.6a)
rethinking::shade(mu.PI_m4.6a, weight.seq_m4.6a)
rethinking::shade(height.PI_m4.6a, weight.seq_m4.6a)


```

::::
:::::

> “This cubic curve is even more flexible than the parabola, so it fits the data even better.” ([McElreath, 2020, p. 113](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=132&annotation=X335CTFQ))

###### Linear (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-linear-full-data-m4-4b}
b: Fit a linear regression model of height on weight (standardized), for the full !Kung data (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-linear-full-data-m4-4b
#| fig-cap: "Fit a linear regression model of height on weight (standardized), for the full !Kung data. The raw data are shown by the circles. The solid curves show the path of μ in each model, and the shaded regions show the 95% interval of the mean (close to the solid curve) and the 95% interval of predictions (wider) (Tidyverse)"
#| cache: true
#| warning: false

d_m4.4b <-
  d_b |>
  dplyr::mutate(weight_s = (weight - base::mean(weight)) / stats::sd(weight))


## fit model m4.4b #############
m4.4b <- 
  brms::brm(data = d_m4.4b, 
      family = gaussian,
      height ~ 1 + weight_s,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b, coef = "weight_s"),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.04b")

weight_seq_m4.4b <- 
  tibble::tibble(weight_s = base::seq(from = -2.5, to = 2.5, length.out = 30))


## data wrangling: fitted, predict #########
fitd_m4.4b <-
  brms:::fitted.brmsfit(m4.4b, 
         newdata = weight_seq_m4.4b,
         probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq_m4.4b)

pred_m4.4b <-
  brms:::predict.brmsfit(m4.4b, 
          newdata = weight_seq_m4.4b,
         probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq_m4.4b) 


## plot linear model ############
ggplot2::ggplot(data = d_m4.4b, 
     ggplot2::aes(x = weight_s)) +
ggplot2::geom_ribbon(data = pred_m4.4b, 
            ggplot2::aes(ymin = Q5.5, ymax = Q94.5),
            fill = "grey83") +
ggplot2::geom_smooth(data = fitd_m4.4b,
            ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
            stat = "identity",
            fill = "grey70", color = "black", alpha = 1, linewidth = 1/4) +
ggplot2::geom_point(ggplot2::aes(y = height),
           color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
ggplot2::labs(subtitle = "linear",
     y = "height") +
ggplot2::coord_cartesian(xlim = base::range(d_m4.4b$weight_s),
                         ylim = base::range(d_m4.4b$height)) +
ggplot2::theme_bw()

```

::::
:::::

> “[The graphic] shows the familiar linear regression from earlier in the chapter, but now with the standardized predictor and full data with both adults and non-adults. The linear model makes some spectacularly poor predictions, at both very low and middle weights.” ([McElreath, 2020, p. 113](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=132&annotation=6FFX8KKA))

###### Parabolic (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-parabolic-regression-m4-5b}
b: Parabolic regression of height on weight (standardized), for the full !Kung data (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-parabolic-regression-m4-5b
#| fig-cap: "Polynomial regressions of height on weight (standardized), for the full !Kung data. The raw data are shown by the circles. The solid curves show the path of μ in each model, and the shaded regions show the 95% interval of the mean (close to the solid curve) and the 95% interval of predictions (wider) (Tidyverse)"


## data wrangling: fitted, predict ##############
weight_seq_m4.5b <- 
  tibble::tibble(weight_s = base::seq(from = -2.5, to = 2.5, length.out = 30)) |> 
  dplyr::mutate(weight_s2 = weight_s^2)

fitd_m4.5b <-
  brms:::fitted.brmsfit(m4.5b, 
         newdata = weight_seq_m4.5b,
         probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq_m4.5b)

pred_m4.5b <-
  brms:::predict.brmsfit(m4.5b, 
          newdata = weight_seq_m4.5b,
          probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq_m4.5b)  

## plot quadratic model #########
ggplot2::ggplot(data = d3_m4.5b, 
       ggplot2::aes(x = weight_s)) +
ggplot2::geom_ribbon(data = pred_m4.5b, 
            ggplot2::aes(ymin = Q5.5, ymax = Q94.5),
            fill = "grey83") +
ggplot2::geom_smooth(data = fitd_m4.5b,
            ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
            stat = "identity",
            fill = "grey70", color = "black", alpha = 1, linewidth = 1/2) +
ggplot2::geom_point(ggplot2::aes(y = height),
           color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
ggplot2::labs(subtitle = "quadratic",
     y = "height") +
ggplot2::coord_cartesian(xlim = base::range(d3_m4.5b$weight_s),
                         ylim = base::range(d3_m4.5b$height)) +
ggplot2::theme_bw()

```

::::
:::::

The quadratic regression does a pretty good job. It is much better than
a linear regression for the full `Howell1` data set.

###### Cubic (T)

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-cubic-regression-m4-6b}
b: Cubic regressions of height on weight (standardized), for the full !Kung data (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-cubic-regression-m4-6b
#| fig-cap: "Fit a cubic regression model of height on weight (standardized), for the full !Kung data. The raw data are shown by the circles. The solid curves show the path of μ in each model, and the shaded regions show the 95% interval of the mean (close to the solid curve) and the 95% interval of predictions (wider) (Tidyverse)"
#| cache: true

## data wrangling: cubic ##############
d3_m4.6b <-
  d3_m4.5b |> 
  dplyr::mutate(weight_s3 = weight_s^3)

m4.6b <- 
  brms::brm(data = d3_m4.6b, 
      family = gaussian,
      height ~ 1 + weight_s + weight_s2 + weight_s3,
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b, coef = "weight_s"),
                brms::prior(normal(0, 1), class = b, coef = "weight_s2"),
                brms::prior(normal(0, 1), class = b, coef = "weight_s3"),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.06b")

## data wrangling: fitted, predict ##############
weight_seq_m4.6b <- 
  weight_seq_m4.5b |> 
  dplyr::mutate(weight_s3 = weight_s^3)

fitd_m4.6b <-
  brms:::fitted.brmsfit(m4.6b, 
         newdata = weight_seq_m4.6b,
          probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq_m4.6b)

pred_m4.6b <-
  brms:::predict.brmsfit(m4.6b, 
          newdata = weight_seq_m4.6b,
          probs = c(0.055, 0.945)) |>
  tibble::as_tibble() |>
  dplyr::bind_cols(weight_seq_m4.6b) 

## plot quadratic model #########
ggplot2::ggplot(data = d3_m4.6b, 
     ggplot2::aes(x = weight_s)) +
ggplot2::geom_ribbon(data = pred_m4.6b, 
            ggplot2::aes(ymin = Q5.5, ymax = Q94.5),
            fill = "grey83") +
ggplot2::geom_smooth(data = fitd_m4.6b,
            ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
            stat = "identity",
            fill = "grey70", color = "black", alpha = 1, linewidth = 1/4) +
ggplot2::geom_point(aes(y = height),
           color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
ggplot2::labs(subtitle = "cubic",
     y = "height") +
ggplot2::coord_cartesian(xlim = base::range(d3_m4.6b$weight_s),
                         ylim = base::range(d3_m4.6b$height)) +
ggplot2::theme_bw()

```

::::
:::::

> “This cubic curve is even more flexible than the parabola, so it fits the data even better.” ([McElreath, 2020, p. 113](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=132&annotation=X335CTFQ))

:::

::::
:::::


:::::{.my-watch-out}
:::{.my-watch-out-header}
WATCH OUT! Models are just geocentric descriptions, not more!
:::
::::{.my-watch-out-container}
> “… it’s not clear that any of these models make a lot of sense. They are good geocentric descriptions of the sample, yes. But there are two problems. First, a better fit to the sample might not actually be a better model. That’s the subject of @sec-chap07. Second, the model contains no biological information. We aren’t learning any causal relationship between height and weight. We’ll deal with this second problem much later, in @sec-chap16.” ([McElreath, 2020, p. 113](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=132&annotation=IAKJ5CC4))
::::
:::::


#### Converting back to natural scale


:::::{.my-procedure}
:::{.my-procedure-header}
From Z-scores to natural scale
:::
::::{.my-procedure-container}
> “The plots in @exm-fig-different-regressions have standard units on the horizontal axis. These units are sometimes called `r glossary("z-score", "z-scores")`. But suppose you fit the model using standardized variables, but want to plot the estimates on the original scale.” ([McElreath, 2020, p. 114](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=133&annotation=HZCU5QEE))

***

0.  Turn off the horizontal axis when plotting the raw data.
1.  Define the location of the labels, in standardized units.
2.  Takes standardized units and converts them back to the natural
    scale.
3.  Explicitly construct and then draw the axis with values from the natural scale.
::::
:::::


:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-ID-text}
: From Z-scores back to natural scale
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### Original

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-standardized-natural-scale-a}
a: From Z-scores back to natural scale (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-standardized-natural-scale-a
#| fig-cap: "Height against weight: Standarzided but with natural scale (Original)"

## R code 4.71a #############
plot(height ~ weight_s, d_a, col = rethinking::col.alpha(rethinking::rangi2, 0.5), xaxt = "n", xlab = "Weight in kg", ylab = "Height in cm")

at_a <- c(-2, -1, 0, 1, 2)                           # 1
labels <- at_a * sd(d_a$weight) + mean(d_a$weight)   # 2
axis(side = 1, at = at_a, labels = round(labels, 1)) # 3
```
1.  Defines the location of the labels, in standardized units.
2.  Takes standardized units and converts them back to the original
    scale.
3.  Draws the axis.

Take a look at the help `?axis` for more details.
::::
:::::


###### Tidyverse

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-standardized-natural-scale-b}
b: From Z-scores back to natural scale (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-standardized-natural-scale-b

at_b <- c(-2, -1, 0, 1, 2)

## plot linear model ############
ggplot2::ggplot(data = d_m4.4b, 
     ggplot2::aes(x = weight_s)) +
ggplot2::geom_ribbon(data = pred_m4.4b, 
            ggplot2::aes(ymin = Q5.5, ymax = Q94.5),
            fill = "grey83") +
ggplot2::geom_smooth(data = fitd_m4.4b,
            ggplot2::aes(y = Estimate, ymin = Q5.5, ymax = Q94.5),
            stat = "identity",
            fill = "grey70", color = "black", alpha = 1, linewidth = 1/4) +
ggplot2::geom_point(ggplot2::aes(y = height),
           color = "navyblue", shape = 1, size = 1.5, alpha = 1/3) +
ggplot2::labs(subtitle = "linear", 
              y = "height in cm", x = "weight in kg") +
ggplot2::coord_cartesian(xlim = base::range(d_m4.4b$weight_s),
                         ylim = base::range(d_m4.4b$height)) +
ggplot2::theme_bw() +
  
# here it is!
ggplot2::scale_x_continuous("standardized weight converted back: Weight in kg",
                   breaks = at_a,
                   labels = base::round(at_b*sd(d_m4.4b$weight) + 
                            base::mean(d_m4.4b$weight), 1))
```

::::
:::::

:::

::::
:::::


### Splines

#### How do splines work?

> “The second way to introduce a curve is to construct something known as a spline. The word spline originally referred to a long, thin piece of wood or metal that could be anchored in a few places in order to aid drafters or designers in drawing curves. In statistics, a spline is a smooth function built out of smaller, component functions. There are actually many types of splines. The `r glossary("b-spline")` we’ll look at here is commonplace. The “B” stands for “basis,” which here just means “component.” B-splines build up wiggly functions from simpler less-wiggly components. Those components are called basis functions.” ([McElreath, 2020, p. 114](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=133&annotation=3RJEARGI))

> “The short explanation of B-splines is that they divide the full range of some predictor variable, like `year`, into parts. Then they assign a parameter to each part. These parameters are gradually turned on and off in a way that makes their sum into a fancy, wiggly curve.” ([McElreath, 2020, p. 115](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=134&annotation=YGZ4PE5G))

> “With B-splines, just like with polynomial regression, we do this by generating new predictor variables and using those in the linear model, $\mu_{i}$. Unlike polynomial regression, B-splines do not directly transform the predictor by squaring or cubing it. Instead they invent a series of entirely new, synthetic predictor variables. Each of these synthetic variables exists only to gradually turn a specific parameter on and off within a specific range of the real predictor variable. Each of the synthetic variables is called a `r glossary("basis function")`.” ([McElreath, 2020, p. 115](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=134&annotation=4HL4NXBW))

:::::{.my-theorem}
:::{.my-theorem-header}
:::::: {#thm-chap04-b-splines}
: Linear model with B-splines
::::::
:::
::::{.my-theorem-container}
$$
\mu_{i} = \alpha + w_{1}B_{i,1} + w_{2}B_{i,2} + w_{3}B_{i,3} + ...
$$
> “… $B_{i,n}$ is the $n$-th basis function’s value on row $i$, and the $w$ parameters are corresponding weights for each. The parameters act like slopes, adjusting the influence of each basis function on the mean $\mu_{i}$. So really this is just another linear regression, but with some fancy, synthetic predictor variables.” ([McElreath, 2020, p. 115](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=134&annotation=H5Y2V6Y3))

:::::{.my-procedure}
:::{.my-procedure-header}
:::::: {#prp-chap04-b-splines-m4-7a}
: How to generate B-splines?
::::::
:::
::::{.my-procedure-container}
1.  Choose the number and distribution of `r glossary("knots")`
2.  Choose the `r glossary("polynomial degree")`
3.  Get the parameter weights for each basis function (define the model
    and make it run)
::::
:::::

::::
:::::

#### Original fit

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-chap04-fit-cherry-blossoms-m4-7a}
: Fit Cherry Blossoms data (Original)
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### data

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-load-data-d2-m4-7a}
a: Load Cherry Blossoms data, filter by complete cases and display summary `precis`
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-load-data-d-m4-7a

## R code 4.72 modified ######################
data(package = "rethinking", list = "cherry_blossoms")
d_m4.7a <- cherry_blossoms
d2_m4.7a <- d_m4.7a[ complete.cases(d_m4.7a$doy) , ] # complete cases on doy
rethinking::precis(d2_m4.7a)
```
The original data set has 1,215 observation but only 827 records have values in day-of-year (`doy`, Day of the year of first blossom) column.
::::
:::::

See `?cherry_blossoms` for details and sources.

###### plot

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-plot-d2-m4-7a}
a: Display raw data for `doy` (Day of the year of first blossom) against the year
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-plot-d2-m4-7a
#| fig-cap: "Display raw data for `doy` (Day of the year of first blossom) against the year (Original)"

plot(doy ~ year, data = d2_m4.7a, col = rethinking::rangi2)
```


> “We’re going to work with the historical record of first day of blossom, `doy`, for now. It ranges from 86 (late March) to 124 (early May). The years with recorded blossom dates run from 812 `r glossary("CE/AD", "CE")` to 2015 CE.” ([McElreath, 2020, p. 114](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=133&annotation=QTB8HPTD))
::::
:::::

You can see that the `doy` data are sparse in the early years. Their number increases steadily approaching the year 2000.

###### knots


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-knots-m4-7a}
a: Choose number of knots and distribute them over the data points of the `year` variable. (Original)
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: chap04-knots-m4-7a
#| fig-cap: "Choose knots and divide data points of the year variable into 15 segments for fitting B-splines in the next step (Original)"

## R code 4.73a complete cases on doy ##########
num_knots <- 15                                  # (1) number of cutpoints
knot_list <- quantile(d2_m4.7a$year,             # (2) divide regions for fitting
      probs = seq(0, 1, length.out = num_knots))
print(tibble::enframe(knot_list), n = 15)        # (3) display knot_list

```

The locations of the knotsare part of the model. Therefore you are responsible for them. We placed the knots at different evenly-spaced quantiles of the predictor variable.

**Explanation of code lines**

(1) **num_knots**: Specify the number of cutpoints that define different regions (or partitions) for a variable. Here were 15 knots chosen.
(2) **knot_list**: Vector that divides the available rows of variable `year` into 15 parts named after the percentiles. It is important to understand that not the range of the variable `year` was divided but the available data points.
(3) I have `knot_list` wrapped into a `tibble` and output with `print(tibble::enframe(knot_list), n = 15)`, so that one can inspect the content of the vector more easily.

::::
:::::

Again you can see that the `doy` data are sparse in the early years. Starting with a the 16th century the we get similar intervals for the distances between years. This can be inspected better graphically in the next "parts" tab.

###### parts

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-plot-parts-d2-m4-7a}
a: Plot data with equally number of `doy` data points for each segment against `year` 
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-chap04-plot-parts-d2-m4-7a
#| fig-cap: "Display raw data for `doy` (Day of the year of first blossom) against the year with vertical lines at the knots positions (Original)"

plot(doy ~ year, data = d2_m4.7a, col = rethinking::rangi2)
abline(v = knot_list)
```
::::
:::::

Starting with a the 16th century the we get similar intervals for the distances between years, e.g. the number of `doy` data points is approximately evenly distributed (59 - 67 years)

```{r}
#| label: chap04-knot_list-diff-d2-m4-7a

knot_list2 <- tibble::enframe(knot_list) |> 
  dplyr::mutate(diff = round(value - dplyr::lag(value), 0))
print(knot_list2, n = 15)
```


###### calc


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-str-splines-m4-7a}
a: Calculate basis functions of a cubic spline (degree = 3) with 15 areas (knots) to fit (Original)
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: chap04-str-splines-m4-7a

## R code 4.74a ################
B_m4.7a <- splines::bs(d2_m4.7a$year,            # (1) generate B-splines
  knots = knot_list[-c(1, num_knots)],           # (2) knots without first & last
  degree = 3,                                    # (3) polynomial (cubic) degree
  intercept = TRUE)                              # (4) intercept
str(B_m4.7a)                                     # (5) show data structure                                   
```

**Explanation of code lines**

(1) The function `splines::bs()` generate the B-spline basis matrix for
    a polynomial spline:
(2) `knots` is generated without the two boundary knots (fist and last
    knot) that are placed at the minimum and maximum of the variable.
    These two knots are excluded with the tricky code `knot_list[-c(1, num_knots)]` to prevent redundancies as
    `splines::bs()` places by default knots at the boundaries. So we
    have 13 internal knots.
(3) With `degree = 3` a cubic B-spline is chosen. The polynomial degree determines how basis functions combine, which determines how the parameters interact to
produce the spline. For degree 1, (=
    line), two basis functions combine at each point. For degree 2 (=
    quadratic), three functions combine at each point. For degree 3 (=
    cubic), there are four basis functions combined. This should give
    enough flexibility for each region to fit.
(4) McElreath chose `intercept = TRUE`: \> "We'll also have an intercept
    to capture the average blossom day. This will make it easier to
    define priors on the basis weights, because then we can just
    conceive of each as a deviation from the intercept." ([McElreath,
    2020, p. 117](zotero://select/groups/5243560/items/NFUEVASQ))
    ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=136&annotation=9MA7DU8W))\
    Kurz mentioned ominously: "For reasons I'm not prepared to get into,
    here, splines don't always include intercept parameters. Indeed, the
    `bs()` default is `intercept = FALSE`."
    ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#splines.)).\
    I do not know exactly what's the result of choosing
    `intercept = TRUE` as the difference to FALSE is marginal. The first
    B-spline (third hill) starts on the left edge of the graph with
    $1.0$ instead of $0$.
(5) In the data structure you can see in the second to last line, that there are two
    "Boundary.knots" at year $812$ and $2015$. These two years are in
    fact the first and last value of the `year` variable:

-   `dplyr::first(d2_m4.7a$year)`: `r dplyr::first(d2_m4.7a$year)`
-   `dplyr::last(d2_m4.7a$year)` : `r dplyr::last(d2_m4.7a$year)`

::::
:::::


###### basis

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-basis-splines-m4-7a}
a: Draw raw basis functions for the year variable for 15 areas (knots) and degree 3 (= cubic polynomial) (Original)
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-chap04-basis-splines-m4-7a
#| fig-cap: "Draw raw basis functions B-splines for the year variable for 15 areas (knots) and degree 3 (cubic polynomial) (Original)"

## R code 4.75a #############
plot(NULL, xlim = range(d2_m4.7a$year),     
  ylim = c(0, 1), xlab = "year", ylab = "basis")  
for (i in 1:ncol(B_m4.7a)) 
  lines(d2_m4.7a$year, B_m4.7a[, i])
```

::::
:::::

###### formula


:::::{.my-theorem}
:::{.my-theorem-header}
:::::: {#thm-theorem-text}
a: Linear model for Cherry Blossoms data
::::::
:::
::::{.my-theorem-container}

$$
\begin{align*}
\text{day of year}_{i} \sim \operatorname{Normal}(\mu_{i}, \sigma) \\
\mu_{i} = \alpha + {\sum_{k=1}^K w_{k} B_{k, i}} \\
\alpha \sim \operatorname{Normal}(100, 10) \\
w_{j} \sim \operatorname{Normal}(0, 10) \\
\sigma \sim \operatorname{Exponential}(1)
\end{align*}
$${#eq-lm-blossoms}

***
> where $\alpha$ is the intercept, $B_{k, i}$ is the value of the $k^\text{th}$ bias function on the $i^\text{th}$ row of the data, and $w_k$ is the estimated regression weight for the corresponding $k^\text{th}$ bias function. (](https://bookdown.org/content/4857/geocentric-models.html#splines.))

::::
:::::

> “[The model] is multiplying each basis value by a corresponding parameter $w_{k}$ and then adding up all $K$ of those products. This is just a compact way of writing a linear model. The rest should be familiar. … the $w$ priors influence how wiggly the spline can be.” ([McElreath, 2020, p. 118](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=137&annotation=Z6NJ85VE))

> “This is also the first time we’ve used an `r glossary("exponential distribution")` as a `r glossary("prior probability", "prior")`. Exponential distributions are useful priors for scale parameters, parameters that must be positive. The prior for $\sigma$ is exponential with a rate of $1$. The way to read an exponential distribution is to think of it as containing no more information than an average deviation. That average // is the inverse of the rate. So in this case it is $1/1 = 1$. If the rate were $0.5$, the mean would be $1/0.5 = 2$. We’ll use exponential priors for the rest of the book, in place of uniform priors. It is much more common to have a sense of the average deviation than of the maximum.” ([McElreath, 2020, p. 118/119](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=138&annotation=N2XUQTJB))

:::::{.my-resource}
:::{.my-resource-header}
Exponential distributions in R
:::
::::{.my-resource-container}
To learn more about the exponential distribution and it's single parameter $\lambda$ lambda), which is also called the *rate* and how to apply it in R read [Exponential distribution in R](https://r-coder.com/exponential-distribution-r/).
::::
:::::


###### quap

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-fit-model-precis-m4-7a}
a: Fit model and show summary (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-fit-model-precis-m4-7a
#| cache: true

## R code 4.76a ############
m4.7a <- rethinking::quap(
  alist(
    D ~ dnorm(mu, sigma),
    mu <- a + B_m4.7a %*% w,
    a ~ dnorm(100, 10),
    w ~ dnorm(0, 10),
    sigma ~ dexp(1)
  ),
  data = list(D = d2_m4.7a$doy, B_m4.7a = B_m4.7a),
  start = list(w = rep(0, ncol(B_m4.7a)))
)

rethinking::precis(m4.7a, depth = 2)
```

The model conforms to the content in tab "formula".

> “To build this model in quap, we just need a way to do that sum. The easiest way is to use matrix multiplication.” ([McElreath, 2020, p. 119](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=138&annotation=3EWHEG6T))

> “Matrix algebra is just a new way to represent ordinary algebra. It is often much more compact. So to make model `m4.7a` easier to program, we used a matrix multiplication of the basis matrix `B_m4.7a` by the vector of parameters $w$: ([McElreath, 2020, p. 120](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=139&annotation=FXMD6NR9

$$B %\*% w$$. 


> This notation is just linear algebra shorthand for (1) multiplying each element of the vector w by each value in the corresponding row of B and then (2) summing up each result.” ([McElreath, 2020, p. 120](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=139&annotation=FXMD6NR9))

The difference between matrix multiplication and traditional style is seen in the line with the basis matrix `B_m4.7a`

$$
\begin{align*}
\text{mu <- a + B %*% w, (1)} \\
\text{mu <- a + sapply(1:827, function(i) sum(B[i, ] * w)), (2)} \\
\end{align*}
$$
(1) Matrix multiplication
(2) Less elegant code but with the same result


::::
:::::

We looked with `precis(m4.7a, depth = 2)` at the posterior means. We see 17 $w$ parameters. But this didn't help much. We need to plot the posterior prediction (see last tab "plot2")

###### weighted

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-weighted-splines-m4-7a}
a: Draw weighted basis functions (B-splines)
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-chap04-weighted-splines-m4-7a
#| fig-cap: "Draw weighted basis functions (B-splines) for the year variable for 15 areas (knots) and degree 3 (cubic polynomial) (Original)"

## R code 4.77a #############
post_m4.7a <- rethinking::extract.samples(m4.7a)
w_m4.7a <- apply(post_m4.7a$w, 2, mean)
plot(NULL,
  xlim = base::range(d2_m4.7a$year), ylim = c(-6, 6),
  xlab = "year", ylab = "basis * weight"
)
for (i in 1:ncol(B_m4.7a)) 
    lines(d2_m4.7a$year, w_m4.7a[i] * B_m4.7a[, i])

```


To get the parameter weights for each basis function, we had to define the model (tab "formula") and make it run (tab "fit").

::::
:::::

Compare the weighted basis functions with the raw basis function (tab "basis") and with the wiggles in the Cherry Blossoms data (tab "plot2").


###### plot2

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-plot-post-m4-7a}
a: Plot the 97% posterior interval for $\mu$, at each year
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-chap04-plot-post-m4-7a
#| fig-cap: "Posterior prediction: The sum of the weighted basis functions, at each point, produces the spline, displayed as a 97% posterior interval of μ"

## R code 4.78a ###############
mu_m4.7a <- rethinking::link(m4.7a)
mu_PI_m4.7a <- apply(mu_m4.7a, 2, rethinking::PI, 0.97)
plot(d2_m4.7a$year, d2_m4.7a$doy, 
     col = rethinking::col.alpha(rethinking::rangi2, 0.3), pch = 16)
rethinking::shade(mu_PI_m4.7a, d2_m4.7a$year, 
     col = rethinking::col.alpha("black", 0.5))
```

::::
:::::

> “The spline is much wigglier now. Something happened around 1500, for example. If you add more knots, you can make this even wigglier. You might wonder how many knots is correct. We’ll be ready to address that question in a few more chapters” ([McElreath, 2020, p. 119](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=138&annotation=Z5X4FZCM))

:::

::::
:::::

#### Tidyverse fit


:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-chap04-fit-cherry-blossoms-m4-7b}
b: Fit Cherry Blossoms data (Tidyverse)
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### data

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-load-data-d2-m4-7b}
b: Load Cherry Blossoms data, complete cases and display summary
::::::
:::
::::{.my-r-code-container}


```{r}
#| label: load-cherry-blossoms-data-m4-7b

## R code 4.72b modified ######################
data(package = "rethinking", list = "cherry_blossoms")
d_m4.7b <- cherry_blossoms

d2_m4.7b <- 
  d_m4.7b |> 
  tidyr::drop_na(doy)


# ground-up tidyverse way to summarize
(
    d2_m4.7b |> 
      tidyr::gather() |> 
      dplyr::group_by(key) |> 
      dplyr::summarise(mean = base::mean(value, na.rm = T),
                sd   = stats::sd(value, na.rm = T),
                ll   = stats::quantile(value, prob = .055, na.rm = T),
                ul   = stats::quantile(value, prob = .945, na.rm = T)) |> 
      dplyr::mutate(dplyr::across(where(is.double), 
                                  \(x) round(x, digits = 2)))
)

```

Within the tidyverse, we can drop NA's with the `tidyr::drop_na()` function. -- This is a much easier way than my
approach originally used with `dplyr::filter()` (See also the different ways in
[StackOverflow](https://stackoverflow.com/questions/70848048/filtering-any-missing-values-in-r/70848085)

::::
:::::


:::::{.my-watch-out}
:::{.my-watch-out-header}
WATCH OUT! `across()` supersedes the family of "scoped variants" like `mutate_if()`
:::
::::{.my-watch-out-container}

I am using [Kurz](https://bookdown.org/content/4857/geocentric-models.html#splines)’ "ground-up {**tidyverse**} way" to summarize the data. But instead of the superseded `dplyr::mutate_if()` function I used the new `dplyr::across()`. At first I wrote `dplyr::mutate(dplyr::across(where(is.double), round, digits = 2))` but after I warning I changed it to an anonymous function:

```
  # Previously
  across(a:b, mean, na.rm = TRUE)

  # Now
  across(a:b, \(x) mean(x, na.rm = TRUE))
```

::::
:::::


###### skim

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-skim-d2-m4-7b}
b: Display summary with `skimr::skim()`
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: summarize-data-with-skim-m4-7b

d2_m4.7b |>
  skimr::skim()
```

Kurz's version does not have the mini histograms. I added another
summary with `skimr::skim()` to add tiny graphics.

::::
:::::

###### plot

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-ID-text}
b: Display raw data for `doy` (Day of the year of first blossom) against the year
::::::
:::
::::{.my-r-code-container}


```{r fig.height=3, fig.width=7}
#| label: fig-chap04-plot-d2-m4-7b
#| fig-cap: "Display raw data for `doy` (Day of the year of first blossom) against the year (Tidyverse)"
#| fig-heights: 3
#| fig-width: 7

d2_m4.7b |> 
  ggplot2::ggplot(ggplot2::aes(x = year, y = doy)) +
  ## color from https://www.colorhexa.com/ffb7c5
  ggplot2::geom_point(color = "#ffb7c5", alpha = 1/2) +
  ggplot2::theme() +
  ggplot2::theme(panel.grid = ggplot2::element_blank(),
    panel.background = ggplot2::element_rect(fill = "#4f455c"))

## color from https://www.colordic.org/w/, 
## inspired by 
## https://chichacha.netlify.com/2018/11/29/plotting-traditional-colours-of-japan/

```

By default {**ggplot2**} removes missing data records with a warning. But I had already removed missing data for the `doy` variable (see: tab "data").

::::
:::::

###### knots

::: {.callout-note style="color: blue;"}
This is the same code and text as in tab "knots" of @exm-chap04-fit-cherry-blossoms-m4-7a.
:::


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-knots-m4-7b}
b: Choose number of knots and distribute them over the data points of the `year` variable.
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: chap04-knots-m4-7b
#| fig-cap: "Choose knots and divide data points of the year variable into 15 segments for fitting B-splines in the next step (Tidyverse)"

## R code 4.73b complete cases on doy ##########
num_knots_b <- 15                                  # (1) number of cutpoints
knot_list_b <- quantile(d2_m4.7b$year,             # (2) divide regions for fitting
      probs = seq(0, 1, length.out = num_knots_b))
print(tibble::enframe(knot_list_b), n = 15)        # (3) display knot_list

```

The locations of the knots are part of the model. Therefore you are responsible for them. We placed the knots at different evenly-spaced quantiles of the predictor variable.

**Explanation of code lines**

(1) **num_knots_b**: Specify the number of cutpoints that define different regions (or partitions) for a variable. Here were 15 knots chosen.
(2) **knot_list_b**: Vector that divides the available rows of variable `year` into 15 parts named after the percentiles. It is important to understand that not the range of the variable `year` was divided but the available data points.
(3) I have `knot_list` wrapped into a `tibble` and output with `print(tibble::enframe(knot_list_b), n = 15)`, so that one can inspect the content of the vector more easily.

::::
:::::

Again you can see that the `doy` data are sparse in the early years. Starting with a the 16th century the we get similar intervals for the distances between years. This can be inspected better graphically in the next "parts" tab.

###### parts

::: {.callout-note style="color: blue;"}
TThe code is different but the text is the same as in tab "parts" of @exm-chap04-fit-cherry-blossoms-m4-7a.
:::

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-plot-parts-d2-m4-7a}
b: Plot data with equally number of `doy` data points for each segment against `year` 
::::::
:::
::::{.my-r-code-container}

```{r, fig.height=3, fig.width=7}
#| label: fig-chap04-plot-parts-d2-m4-7b
#| fig-cap: "Display raw data for `doy` (Day of the year of first blossom) against the year with vertical lines at the knots positions (Tidyverse)"
#| fig-height: 3
#| fig-width: 7

d2_m4.7b |> 
  ggplot2::ggplot(ggplot2::aes(x = year, y = doy)) +
  ggplot2::geom_vline(xintercept = knot_list_b, 
             color = "white", alpha = 1/2) +
  ggplot2::geom_point(color = "#ffb7c5", alpha = 1/2) +
  ggplot2::theme_bw() +
  ggplot2::theme(panel.background = ggplot2::element_rect(fill = "#4f455c"),
        panel.grid = ggplot2::element_blank())
```

::::
:::::


Starting with a the 16th century the we get similar intervals for the distances between years, e.g. the number of `doy` data points is approximately evenly distributed (59 - 67 years)

```{r}
#| label: chap04-knot_list-diff-d2-m4-7b

knot_list2 <- tibble::enframe(knot_list) |> 
  dplyr::mutate(diff = round(value - dplyr::lag(value), 0))
print(knot_list2, n = 15)
```


###### calc

::: {.callout-note style="color: blue;"}
This is the same code and text as in tab "calc" of @exm-chap04-fit-cherry-blossoms-m4-7a.
:::


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-str-splines-m4-7b}
b: Calculate basis functions of a cubic spline (degree = 3) with 15 areas (knots) to fit
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-str-splines-m4-7b

## R code 4.74b ################
B_m4.7b <- splines::bs(d2_m4.7b$year,            # (1) generate B-splines
  knots = knot_list_b[-c(1, num_knots_b)],       # (2) knots without first & last
  degree = 3,                                    # (3) polynomial (cubic) degree
  intercept = TRUE)                              # (4) intercept
str(B_m4.7b)                                     # (5) show data structure
```

**Explanation of code lines**

(1) The function `splines::bs()` generate the B-spline basis matrix for
    a polynomial spline:
(2) `knots` is generated without the two boundary knots (fist and last
    knot) that are placed at the minimum and maximum of the variable.
    These two knots are excluded with the tricky code `knot_list[-c(1, num_knots)]` to prevent redundancies as
    `splines::bs()` places by default knots at the boundaries. So we
    have 13 internal knots.
(3) With `degree = 3` a cubic B-spline is chosen. The polynomial degree determines how basis functions combine, which determines how the parameters interact to
produce the spline. For degree 1, (=
    line), two basis functions combine at each point. For degree 2 (=
    quadratic), three functions combine at each point. For degree 3 (=
    cubic), there are four basis functions combined. This should give
    enough flexibility for each region to fit.
(4) McElreath chose `intercept = TRUE`: \> "We'll also have an intercept
    to capture the average blossom day. This will make it easier to
    define priors on the basis weights, because then we can just
    conceive of each as a deviation from the intercept." ([McElreath,
    2020, p. 117](zotero://select/groups/5243560/items/NFUEVASQ))
    ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=136&annotation=9MA7DU8W))\
    Kurz mentioned ominously: "For reasons I'm not prepared to get into,
    here, splines don't always include intercept parameters. Indeed, the
    `bs()` default is `intercept = FALSE`."
    ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#splines.)).\
    I do not know exactly what's the result of choosing
    `intercept = TRUE` as the difference to FALSE is marginal. The first
    B-spline (third hill) starts on the left edge of the graph with
    $1.0$ instead of $0$.
(5) In the data structure you can see in the second to last line, that there are two
    "Boundary.knots" at year $812$ and $2015$. These two years are in
    fact the first and last value of the `year` variable:

-   `dplyr::first(d2_m4.7b$year)`: `r dplyr::first(d2_m4.7b$year)`
-   `dplyr::last(d2_m4.7b$year)` : `r dplyr::last(d2_m4.7b$year)`

::::
:::::

###### basis

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-basis-splines-m4-7b}
a: Draw raw basis functions for the year variable for 15 areas (knots) and degree 3 (= cubic polynomial)
::::::
:::
::::{.my-r-code-container}

```{r, fig.height=3, fig.width=7}
#| label: fig-chap04-basis-splines-m4-7b
#| fig-cap: "Draw raw basis functions B-splines for the year variable for 15 areas (knots) and degree 3 (cubic polynomial) (Tidyverse)"
#| fig-height: 3
#| fig-width: 7

# wrangle a bit
d_B_m4.7b <-
  B_m4.7b |> 
  base::as.data.frame() |> 
  rlang::set_names(stringr::str_c(0, 1:9), 10:17) |>  
  dplyr::bind_cols(dplyr::select(d2_m4.7b, year)) |> 
  tidyr::pivot_longer(-year,
               names_to = "bias_function",
               values_to = "bias")

# plot
d_B_m4.7b |> 
  ggplot2::ggplot(ggplot2::aes(x = year, y = bias, group = bias_function)) +
  ggplot2::geom_vline(xintercept = knot_list_b, color = "white", alpha = 1/2) +
  ggplot2::geom_line(color = "#ffb7c5", alpha = 1/2, linewidth = 1.5) +
  ggplot2::ylab("bias value") +
  ggplot2::theme_bw() +
  ggplot2::theme(panel.background = ggplot2::element_rect(fill = "#4f455c"),
        panel.grid = ggplot2::element_blank())
```


::::
:::::

:::::{.my-watch-out}
:::{.my-watch-out-header}
WATCH OUT! Warning with `tibble::as_tibble()` instead of `base::as.data.frame()`
:::
::::{.my-watch-out-container}
At first I used as always `tibble::as_tibble()`. But this generated a warning message.

> Don't know how to automatically pick scale for object of type <bs/basis/matrix>. Defaulting to continuous.

I learned that it has to do with the difference of a variable name versus a function name in `aes()`. For instance "mean" versus variable "Mean" ([statology](https://www.statology.org/r-dont-know-how-to-automatically-pick-scale-for-object-type-function/)) or "sample" versus "Sample" ([StackOverflow](https://stackoverflow.com/questions/22058322/ggplot-error-dont-know-how-to-automatically-pick-scale-for-object-of-type-func)).

But why this is the case with `tibble::as_tibble()` but not with `base::as.data.frame()` I do not know. Perhaps it has to do with the somewhat stricter tibble naming rules? (see: [tibble overview](https://tibble.tidyverse.org/) and [tbl_df class](https://tibble.tidyverse.org/reference/tbl_df-class.html)).
::::
:::::


###### facet

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-facets-m4-7b}
b: Basis functions of a cubic spline with 15 knots broken up in different facets
::::::
:::
::::{.my-r-code-container}

```{r, fig.height=10}
#| label: fig-chap04-facets-m4-7b
#| fig-cap: "Basis functions of a cubic spline with 15 knots broken up in different facets"
#| fig-height: 10

d_B_m4.7b |> 
  dplyr::mutate(bias_function = stringr::str_c("bias function ", 
                               bias_function)) |> 
  ggplot2::ggplot(ggplot2::aes(x = year, y = bias)) +
  ggplot2::geom_vline(xintercept = knot_list_b, 
             color = "white", alpha = 1/2) +
  ggplot2::geom_line(color = "#ffb7c5", linewidth = 1.5) +
  ggplot2::ylab("bias value") +
  ggplot2::theme_bw() +
  ggplot2::theme(panel.background = ggplot2::element_rect(fill = "#4f455c"),
        panel.grid = ggplot2::element_blank(),
        strip.background = ggplot2::element_rect(
          fill = scales::alpha("#ffb7c5", .25), color = "transparent"),
        strip.text = ggplot2::element_text(
          size = 8, margin = ggplot2::margin(0.1, 0, 0.1, 0, "cm"))) +
  ggplot2::facet_wrap(~ bias_function, ncol = 1)
```

To see what's going on in that plot, we tool Kurz code to break the graphic up for displaying all basis function in an extra slot with `ggplot2::facet_wrap()`.

::::
:::::

###### exp

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-exp-prior-m4-7b}
b: Using the density of the exponential distribution `dexp()` to show the prior
::::::
:::
::::{.my-r-code-container}

```{r, fig.heigt=2.5, fig.width=4}
#| label: fig-chap04-exp-prior-m4-7b
#| fig-cap: "Using the density of the exponential distribution `dexp()` to show the prior"
#| fig-width: 4
#| fig-height: 2.5

tibble::tibble(x = base::seq(from = 0, to = 10, by = 0.1)) |> 
  dplyr::mutate(d = stats::dexp(x, rate = 1)) |> 
  
  ggplot2::ggplot(ggplot2::aes(x = x, y = d)) +
  ggplot2::geom_area(fill = "grey") +
  ggplot2::scale_y_continuous(NULL, breaks = NULL) +
  ggplot2::theme_bw()
```

::::
:::::

We used the `dexp()` function for the model (see tab "formula" in @exm-chap04-fit-cherry-blossoms-m4-7a). To get a sense of what that prior looks like, we displayed the `dexp()` function.

###### matrix

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-wrangle-d3-m4-7b}
b: Add B-splines matrix for 15 knots to the data frame by creating a new data frame
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-wrangle-d3-m4-7b

d3_m4.7b <-
  d2_m4.7b |> 
  dplyr::mutate(B_m4.7b = B_m4.7b) 

# take a look at the structure of the new data frame
d3_m4.7b |> 
  dplyr::glimpse()
```
::::
:::::

In R code 4.76a (tab "quap" in @exm-chap04-fit-cherry-blossoms-m4-7b), McElreath
defined his data in a list (`list(D = d2_m4.7a$doy, B_m4.7a = B_m4.7a)`). The approach with {**brms**} will be a little different. We'll add the `B_m4.7b` matrix to our `d2_m4.7b` data frame and name the results as `d3_m4.7b`.


> In the `d3_m4.7b` data, columns `year` through `temp_lower` are all standard
data columns. The `B_m4-7b` column is a *matrix column*, which contains the
same number of rows as the others, but also smuggled in 17 columns
*within* that column. Each of those 17 columns corresponds to one of our
synthetic $B_{k}$ variables. The advantage of such a data structure is
we can simply define our `formula` argument as $doy \sim 1 + B_m4-7b$, where
`B_m4-7b` is a stand-in for `B_m4-7b.1 + B_m4-7b.2 + ... + B_m4-7b.17`. ([Kurz](https://bookdown.org/content/4857/geocentric-models.html#splines.)) 

See next tab "brm".

###### brm

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-fit-model-m4-7b}
b: Fit model m4.7b with `brms::brm()`
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-fit-model-m4-7b
#| cache: true

m4.7b <- 
  brms::brm(data = d3_m4.7b,
      family = gaussian,
      doy ~ 1 + B_m4.7b,
      prior = c(brms::prior(normal(100, 10), class = Intercept),
                brms::prior(normal(0, 10), class = b),
                brms::prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.07b")

brms:::print.brmsfit(m4.7b)
```

::::
:::::

> Each of the 17 columns in our `B` matrix was assigned its
own parameter. If you fit this model using McElreath's rethinking code,
you'll see the results are very similar. (Kurz, ibid.)

###### glimpse

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-glimpse-model-m4-7b}
b: Glimpse at transformed data of model `m4.7b` to draw objects
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: chap04-glimpse-model-m4-7b

post_m4.7b <- brms::as_draws_df(m4.7b)

glimpse(post_m4.7b)
```

::::
:::::

We used `brms::as_draws_df()` to transform `m4.7b` to a `draw`
object so that it can processed easier by the {**posterior**} package.

###### weighted

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-weighted-m4-7b}
b: Weight each basis function by its corresponding parameter
::::::
:::
::::{.my-r-code-container}

```{r, fig.height=3, fig.width=7}
#| label: fig-chap04-weighted-m4-7b
#| fig-cap: "Each basis function weighted by its corresponding parameter"
#| fig-height: 3
#| fig-width: 7
#| warning: false

post_m4.7b |> 
  dplyr::select(b_B_m4.7b1:b_B_m4.7b17) |> 
  rlang::set_names(base::c(stringr::str_c(0, 1:9), 10:17)) |> 
  tidyr::pivot_longer(tidyselect::everything(), 
                      names_to = "bias_function") |> 
  dplyr::group_by(bias_function) |> 
  dplyr::summarise(weight = base::mean(value)) |> 
  ## add weight column to year & bias via "bias_function"
  dplyr::full_join(d_B_m4.7b, by = "bias_function") |> 
  
  # plot
  ggplot2::ggplot(ggplot2::aes(x = year, 
                        y = bias * weight, 
                        group = bias_function)) +
  ggplot2::geom_vline(xintercept = knot_list_b, 
                      color = "white", alpha = 1/2) +
  ggplot2::geom_line(color = "#ffb7c5", 
                     alpha = 1/2, linewidth = 1.5) +
  ggplot2::theme_bw() +
  ggplot2::theme(panel.background = 
                 ggplot2::element_rect(fill = "#4f455c"),
                 panel.grid = ggplot2::element_blank()) 
```

> In case you missed it, the main action in the {**ggplot2**} code was `y = bias * weight`, where we defined the $y$-axis as the product of `bias` and `weight`. This is fulfillment of the $w_k B_{k, i}$ parts of the model. (Kurz, ibid.)


::::
:::::

###### plot2

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-chap04-post-m4-7b}
b: Expected values of the posterior predictive distribution
::::::
:::
::::{.my-r-code-container}


```{r, fig.height=3, fig.width=7}
#| label: fig-chap04-post-pred-dist-m4-7b
#| fig-cap: "Expected values of the posterior predictive distribution"
#| fig-height: 3
#| fig-width: 7

f_m4.7b <- brms:::fitted.brmsfit(m4.7b, probs = c(0.055, 0.945))

f_m4.7b |> 
  base::as.data.frame() |> 
  dplyr::bind_cols(d3_m4.7b) |> 
  
  ggplot2::ggplot(ggplot2::aes(x = year, y = doy, 
                               ymin = Q5.5, ymax = Q94.5)) + 
  ggplot2::geom_vline(xintercept = knot_list_b, 
                      color = "white", alpha = 1/2) +
  ggplot2::geom_hline(yintercept = 
                      brms::fixef(m4.7b, 
                                  probs = c(0.055, 0.945))[1, 1], 
                      color = "white", linetype = 2) +
  ggplot2::geom_point(color = "#ffb7c5", alpha = 1/2) +
  ggplot2::geom_ribbon(fill = "white", alpha = 2/3) +
  ggplot2::labs(x = "year", y = "day in year") +
  ggplot2::theme_bw() +
  ggplot2::theme(panel.background = 
                 ggplot2::element_rect(fill = "#4f455c"),
                 panel.grid = ggplot2::element_blank())
```

> If it wasn’t clear, the dashed horizontal line intersecting a little above $100$ on the $y$-axis is the posterior mean for the intercept. (Kurz, ibid.)

In contrast to Kurz’ code I have used `probs = c(0.055, 0.945)` to get the same 89% (Q5.5 and Q94.5) intervals as in the book.

::::
:::::


:::

::::
:::::


### Smooth functions for a rough world

> “The splines in the previous section are just the beginning. A entire class of models, generalized additive models (`r glossary("GAM")`s), focuses on predicting an outcome variable using smooth functions of some predictor variables.” ([McElreath, 2020, p. 120](zotero://select/groups/5243560/items/NFUEVASQ)) ([pdf](zotero://open-pdf/groups/5243560/items/CPFRPHX8?page=139&annotation=UX53SXQD))

:::::{.my-resource}
:::{.my-resource-header}
Resources for working with Generalized Additive Models (GAMs)
:::
::::{.my-resource-container}
-   Wood, S. N. (2017). Generalized Additive Models: An Introduction
    with R, Second Edition (2nd ed.). Taylor & Francis Inc.
-   SemanticScholar: [Series of paper dedicated to
    GAMs](https://www.semanticscholar.org/paper/Generalized-Additive-Models%3A-An-Introduction-with-R-G%C3%B3mez%E2%80%90Rubio/025f25133a5c1da746eb7e7719bb715b71a7f518)
-   Anish Singh Walia: [Generalized Additive
    Model](https://datascienceplus.com/generalized-additive-models/)
-   Noam Ross: [GAMs in
    R](https://noamross.github.io/gams-in-r-course/): A Free,
    Interactive Course using `mgcv`
-   Michael Clark: [Generalized Additive
    Models](https://m-clark.github.io/generalized-additive-models/)
-   Dheeraj Vaidya: [Generalized Additive
    Model](https://www.wallstreetmojo.com/generalized-additive-model/)

------------------------------------------------------------------------

-   Adam Shaif: What is Generalised Additive Model? ([Medium member
    story](https://towardsdatascience.com/generalised-additive-models-6dfbedf1350a))
-   Eugenio Anello: Generalized Additive Models with R ([Medium member
    story](https://pub.towardsai.net/generalized-additive-models-with-r-5f01c8e52089))

***

In the bonus section [Kurz](https://bookdown.org/content/4857/geocentric-models.html#summary-first-bonus-smooth-functions-with-brmss) added some more references for GAMs:

-   For more on the B-splines and smooths, more generally, check out the
    blog post by the great [Gavin Simpson](https://twitter.com/ucfagls),
    [Extrapolating with B splines and
    GAMs](https://fromthebottomoftheheap.net/2020/06/03/extrapolating-with-gams/).
-   For a high-level introduction to the models you can fit with
    {**mgcv**}, check out the nice talk by [Noam
    Ross](https://twitter.com/noamross), [Nonlinear models in R: The
    wonderful world of mgcv](https://youtu.be/q4_t8jXcQgc), or the
    equally-nice presentation by Simpson, [Introduction to generalized
    additive models with R and mgcv](https://youtu.be/sgw4cu8hrZM).
-   Ross offers a free online course covering {**mgcv**}, called [GAMS
    in R](https://noamross.github.io/gams-in-r-course/), and he
    maintains a GitHub repo cataloging other GAM-related resources,
    called [Resources for learning about and using GAMs in
    R](https://github.com/noamross/gam-resources).
-   For specific examples of fitting various GAMS with {**brms**}, check
    out Simpson's blog post, [Fitting GAMs with brms: part
    1](https://fromthebottomoftheheap.net/2018/04/21/fitting-gams-with-brms/).
-   Finally, [Tristan Mahr](https://twitter.com/tjmahr) has a nice blog
    post called [Random effects and penalized splines are the same
    thing](https://www.tjmahr.com/random-effects-penalized-splines-same-thing/),
    where he outlined the connections between penalized smooths, such as
    you might fit with {**mgcv**}, with the multilevel model, which
    we'll learn all about starting in @sec-chap13.
    
::::
:::::

In this reference Kurz announces that in @sec-chap13 we will understand better what's going on with the `s()` function that he described in his first bonus section. As I didn't understand this bonus section I skipped it and will come back to this section when I have understood more on GAMs and related model fitting procedures. 

In the second bonus section Kurz explained that compact syntax to pass a matrix column of predictors into the formula as used in tab "matrix" in @exm-chap04-fit-cherry-blossoms-m4-7b was not an isolated trick but is a general approach. He referenced an example of a multiple regression
model in Section 11.2.6 of the book "Regression and Other Stories", by Gelman, Hill, and Vehtari. I skip this second bonus section too as I do not have sufficient knowledge to fully understand the procedure.

## Practice

Problems are labeled Easy (E), Medium (M), and Hard (H).

### Easy

#### 4E1

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4E1: In the model definition below, which line is the `r glossary("likelihood")`?
:::
::::{.my-exercise-container}

$$
\begin{align*}
y_{i} \sim \operatorname{Normal}(\mu, \sigma) \\
\mu \sim \operatorname{Normal}(0, 10) \\
\sigma \sim \operatorname{Exponential}(1)
\end{align*}
$$ {#eq-4e1}

***

In this kind of model description the first line is always the likelihood, the other lines are the priors.

::::
:::::

#### 4E2

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4E2: In the model definition @eq-4e1, how many `r glossary("parameter")`s are in the `r glossary("posterior distribution")`?
:::
::::{.my-exercise-container}
There are two priors: $\mu$ and $\sigma$.
::::
:::::

#### 4E3

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4E3: Using the model definition @eq-4e1, write down the appropriate form of `r glossary("Bayes’ theorem")` that includes the proper likelihood and `r glossary("prior probability", "priors")`.

:::
::::{.my-exercise-container}

Remember @eq-in-words:

$$
\text{Posterior} = \frac{\text{Probability of data} \times \text{Prior}}{\text{Average probability of data}}
$$
$$
Pr{(\mu, \sigma\mid y)} = \frac{\operatorname{Normal}(\mu, \sigma)\operatorname{Normal}(0, 10)\operatorname{Exponential}(1)}{\int\int\operatorname{Normal}(\mu, \sigma)\operatorname{Normal}(0, 10)\operatorname{Exponential}(1)d\mu d\sigma} 
$${#eq-4e3}

:::::{.my-watch-out}
:::{.my-watch-out-header}
WATCH OUT! I didn't know the answer and had to look it up at the [solutions by Jake Thompson](https://sr2-solutions.wjakethompson.com/linear-models-causal-inference#chapter-4).
:::
:::::


::::
:::::


#### 4E4

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4E4: In the model definition below, which line is the linear model?
:::
::::{.my-exercise-container}

$$
\begin{align*}
y_{i} \sim \operatorname{Normal}(\mu, \sigma) \\
\mu_{i} = \alpha + \beta{x_{i}} \\
\alpha \sim \operatorname{Normal}(0, 10) \\
\beta \sim \operatorname{Normal}(0, 1) \\
\sigma \sim \operatorname{Exponential}(1)
\end{align*}
$$ {#eq-4e4}

***

The second line ($\mu_{i} = \alpha + \beta{x_{i}}$) is the linear model.
::::
:::::

#### 4E5

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4E5: In the model definition @eq-4e4, how many parameters are in the
posterior distribution?
:::
::::{.my-exercise-container}

:::::{.my-important}
:::{.my-important-header}
Distinguish between deterministic (`=`) and stochastic (`~`) equations!
:::
:::::

> There are now three model parameters: $\alpha, \beta$, and \sigma$. The mean, $\mu$  is no longer a parameter, as it is defined deterministically, as a function of other parameters in the model.

:::::{.my-watch-out}
:::{.my-watch-out-header}
WATCH OUT! $\alpha$ and $\beta$ are unobserved variables and therefore parameters!
:::
::::{.my-watch-out-container}

Here comes my completely wrong answer:

> There are two parameters in the posterior distribution: $\mu$ and $\sigma$.
 
I did not understand, that $\mu$ is not a `r glossary("parameter")` anymore and thought --- exactly the reverse --- that $\alpha$ and $\beta$ are not parameters but $\mu$ and $\sigma$.
::::
:::::

::::
:::::

### Middle

#### 4M1 {#sec-4m1}
:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4M1: For the model definition below, simulate observed $y$ values from the prior (not the posterior).
:::
::::{.my-exercise-container}

$$
\begin{align*}
y_{i} \sim \operatorname{Normal}(\mu, \sigma) \\
\mu \sim \operatorname{Normal}(0, 10) \\
\sigma \sim \operatorname{Exponential}(1)
\end{align*}
$$

See the discussion under @sec-prior-predictive-sim and the code in @exm-prior-predictive-sim.

***

::: {.panel-tabset}

###### Original


:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-exercise-4m1a}
a: Exercise 4M1a: Simulate observed $y$ values from the prior
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-exercise-4m1a
#| fig-cap: "Simulate observed y values from the prior"

k.sim_4m1a <- 1e4
set.seed(4) # to make exercise reproducible

## R code 4.14a adapted #######################################
sample_mu_4m1a <- rnorm(k.sim_4m1a, mean = 0, sd = 10)
sample_sigma_4m1a <- rexp(k.sim_4m1a, rate = 1)
priors_4m1a <- rnorm(k.sim_4m1a, sample_mu_4m1a, sample_sigma_4m1a)
rethinking::dens(priors_4m1a, 
                 adj = 1, 
                 norm.comp = TRUE,
                 show.zero = TRUE,
                 col = "red")
```

::::
:::::

###### Tidyverse

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-exercise-4m1b}
b: Exercise 4M1b: Simulate observed $y$ values from the prior
::::::
:::
::::{.my-r-code-container}
```{r}
#| label: fig-exercise-4m1b
#| fig-cap: "Simulate observed y values from the prior"

k.sim_4m1b <- 1e4
set.seed(4) # to make exercise reproducible

## R code 4.14b #######################################

sim_4m1b <-
  tibble::tibble(
    sample_mu_4m1b = stats::rnorm(k.sim_4m1b, mean = 0, sd  = 10),
    sample_sigma_4m1b = stats::rexp(k.sim_4m1b, rate = 1)) |> 
  dplyr::mutate(priors_4m1b = stats::rnorm(k.sim_4m1b, 
                                           mean = sample_mu_4m1b, 
                                           sd = sample_sigma_4m1b))
  
sim_4m1b |> 
  ggplot2::ggplot(ggplot2::aes(x = priors_4m1b)) +
  ggplot2::geom_density(color = "red") +
  ggplot2::stat_function(
        fun = dnorm,
        args = with(sim_4m1b, 
               c(mean = mean(priors_4m1b), 
                 sd = sd(priors_4m1b)))
        ) +
  ggplot2::theme_bw()


```

::::
:::::

:::

::::
:::::

#### 4M2

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-4m2}
: Translate the model of @sec-4m1 into a `quap()` and `brm()` formula
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### quap()

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-code-4m2a}
a: Translate the model of @sec-4m1 into a `quap()` formula. (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: exercise-4m2a

flist_4m2a <- 
  alist(
    y ~ dnorm(mu, sigma),
    mu ~ dnorm(0,10),
    sigma ~ dexp(1)
  )

flist_4m2a
```


::::
:::::


###### brm()

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-code-4m2b}
b: Translate the model of @sec-4m1 into a `quap()` formula. (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: exercise-4m2b
#| eval: false

exr_4m2b <- 
  brms::brm(
      data = d2_b, 
      family = gaussian(),
      formula = y ~ 1,
      prior = c(brms::prior(normal(0, 10), class = Intercept),
                brms::prior(exponential(1), class = sigma)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/m04.4m2b")
```

::::
:::::

:::

::::
:::::

#### 4M3

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-4m3}
: Translate the `quap()` model formula below into a mathematical model definition
::::::
:::
::::{.my-example-container}

```
y ~ dnorm(mu ,sigma), 
mu <- a + b * x, 
a ~ dnorm(0, 10), 
b ~ dunif(0, 1),
sigma ~ dexp(1)
```

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-code-4m3a}
: Translate the `quap()` model formula into a mathematical model definition
::::::
:::
::::{.my-r-code-container}

$$
\begin{align*}
y_{i} \sim Normal(\mu_{i},\sigma) \\
\mu_{i} = \alpha + \beta x_{i} \\
\alpha \sim Normal(0, 10) \\
\beta \sim Uniform(0, 1) \\
\sigma \sim Exponential(1)
\end{align*}
$$

::::
:::::


::::
:::::

#### 4M4

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4M4: A sample of students is measured for height each year for 3 years. After the third year, you want to fit a linear regression predicting height using year as a predictor. Write down the mathematical model definition for this regression, using any variable names and priors you choose. Be prepared to defend your choice of priors.
:::
::::{.my-exercise-container}

$$
\begin{align*}
height_{i} \sim Normal(\mu_{i}, \sigma) \\
\mu_{i} = \alpha + \beta(x_{i} - \overline{x}) \\
\alpha \sim Normal(180, 20) \\
\beta \sim Log-Normal(0, 10) \\
\sigma \sim Exponential(1)
\end{align*}
$$


::::
:::::

#### 4M7

:::::{.my-exercise}
:::{.my-exercise-header}
Exercise 4M7: Refit model `m4.3` from the chapter, but omit the mean weight `xbar` this time. Compare the new model’s posterior to that of the original model. In particular, look at the covariance among the parameters. What is different? Then compare the posterior predictions of both models.
:::
::::{.my-exercise-container}

:::::{.my-example}
:::{.my-example-header}
:::::: {#exm-ID-text}
: Example 4M7: Refit model `m4.3` without mean weight `xbar`
::::::
:::
::::{.my-example-container}

::: {.panel-tabset}

###### Original

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-4m7a}
a: Example 4M7a: Refit model `m4.3a` without mean weight `xbar` (Original)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-exercise-4m7a
#| results: hold
#| fig-cap: "Plotting posterior inference against the data (Original)"

## R code 4.42a adapted #############################

# fit model
exr_4m7a <- rethinking::quap(
  alist(
    height ~ dnorm(mu, sigma),
    mu <- a + (b * weight),
    a ~ dnorm(178, 20),
    b ~ dlnorm(0, 1),
    sigma ~ dunif(0, 50)
  ),
  data = d2_a
)

## R code 4.44a ############################
cat("**************** Summary **********************\n")
rethinking::precis(exr_4m7a)

## R code 4.45 ################
cat("\n\n*************** Covariances ********************\n")
round(rethinking::vcov(exr_4m7a), 3)

# rethinking::pairs(exr_4m7a)

## R code 4.46a ############################################
plot(height ~ weight, data = d2_a, col = rethinking::rangi2)
post_exr_4m7a <- rethinking::extract.samples(exr_4m7a)
a_map <- mean(post_exr_4m7a$a)
b_map <- mean(post_exr_4m7a$b)
curve(a_map + (b_map * x) , add = TRUE)
```

::::
:::::


###### Tidyverse

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-fig-4m7b}
b: Example 4M7a: Refit model `m4.3b` without mean weight `xbar` (Tidyverse)
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: fig-exercise-4m7b
#| cache: true
#| results: hold
#| fig-cap: "Plotting posterior inference against the data (Tidyverse)"
#| warning: false

exr_4m7b <- 
  brms::brm(data = d2_b, 
      formula = height ~ 1 + weight,
      family = gaussian(),
      prior = c(brms::prior(normal(178, 20), class = Intercept),
                brms::prior(lognormal(0, 1), class = b, lb = 0),
                brms::prior(uniform(0, 50), class = sigma, ub = 50)),
      iter = 2000, warmup = 1000, chains = 4, cores = 4,
      seed = 4,
      file = "brm_fits/exr_4m7b")

cat("**************** Summary **********************\n")
brms:::summary.brmsfit(exr_4m7b)

cat("\n\n*************** Covariances ********************\n")
brms::as_draws_df(exr_4m7b) |>
  dplyr::select(b_Intercept:sigma) |>
  stats::cov() |>
  base::round(digits = 3)

d2_b |>
  ggplot2::ggplot(ggplot2::aes(x = weight, y = height)) +
  ggplot2::geom_abline(
    intercept = brms::fixef(exr_4m7b, probs = c(0.055, 0.945))[[1]], 
    slope     = brms::fixef(exr_4m7b, probs = c(0.055, 0.945))[[2]]) +
  ggplot2::geom_point(shape = 1, size = 2, color = "royalblue") +
  ggplot2::labs(x = "weight", y = "height") +
  ggplot2::theme_bw()
```

::::
:::::

:::

In contrast to the `vcov()` values of the centered weight version in @exm-chap04-table-interpretation we have now big covariances. But the posterior predictions of both models (centered and uncentered version) are nearly the same. Compare with @fig-raw-data-line-m4-3a (Original) and @fig-raw-data-line2-m4-3b (Tidyverse).

::::
:::::

***


::::
:::::


## Session Info

:::::{.my-r-code}
:::{.my-r-code-header}
:::::: {#cnj-chap04-session-info}
: Session Info
::::::
:::
::::{.my-r-code-container}

```{r}
#| label: session-info

sessionInfo()
```

::::
:::::