Skip to content

Commit

Permalink
Slides 13
Browse files Browse the repository at this point in the history
  • Loading branch information
gpleiss committed Oct 9, 2024
1 parent c4205b4 commit 16b7d95
Show file tree
Hide file tree
Showing 10 changed files with 3,743 additions and 2,437 deletions.

Large diffs are not rendered by default.

705 changes: 351 additions & 354 deletions _freeze/schedule/slides/13-gams-trees/figure-revealjs/big-tree-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
174 changes: 84 additions & 90 deletions _freeze/schedule/slides/13-gams-trees/figure-revealjs/gam-mod-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1,232 changes: 612 additions & 620 deletions _freeze/schedule/slides/13-gams-trees/figure-revealjs/partition-view-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2,283 changes: 1,137 additions & 1,146 deletions _freeze/schedule/slides/13-gams-trees/figure-revealjs/unnamed-chunk-1-1.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
195 changes: 141 additions & 54 deletions schedule/slides/13-gams-trees.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -83,9 +83,32 @@ $\Expect{Y \given X=x} = \beta_0 + f_1(x_{1})+\cdots+f_p(x_{p}),$

then

$\textrm{MSE}(\hat f) = \frac{Cp}{n^{4/5}} + \sigma^2.$
$$
R_n^{(\mathrm{GAM})} =
\underbrace{\frac{C_1^{(\mathrm{GAM})}}{n^{4/5}}}_{\mathrm{bias}^2} +
\underbrace{\frac{C_2^{(\mathrm{GAM})}}{n^{4/5}}}_{\mathrm{var}} +
\sigma^2.
$$
Compare with OLS and non-additive local smoothers:

$$
R_n^{(\mathrm{OLS})} =
\underbrace{C_1^{(\mathrm{OLS})}}_{\mathrm{bias}^2} +
\underbrace{\tfrac{C_2^{(\mathrm{OLS})}}{n/p}}_{\mathrm{var}} +
\sigma^2,
\qquad
R_n^{(\mathrm{local})} =
\underbrace{\tfrac{C_1^{(\mathrm{local})}}{n^{4/(4+p)}}}_{\mathrm{bias}^2} +
\underbrace{\tfrac{C_2^{(\mathrm{local})}}{n^{4/(4+p)}}}_{\mathrm{var}} +
\sigma^2.
$$

* Exponent no longer depends on $p$. Converges faster. (If the truth is additive.)
---

* We no longer have an exponential dependence on $p$!

* But our predictor is restrictive to functions that decompose additively.
(This is a big limitation.)

* You could also use the same methods to include "some" interactions like

Expand All @@ -108,64 +131,53 @@ plot(ex_smooth2,

## Regression trees

Trees involve stratifying or segmenting the predictor space into a number of simple regions.

Trees are simple and useful for interpretation.

Basic trees are not great at prediction.

Modern methods that use trees are much better (Module 4)

## Regression trees

Regression trees estimate piece-wise constant functions

The slabs are axis-parallel rectangles $R_1,\ldots,R_K$ based on $\X$

In each region, we average the $y_i$'s: $\hat\mu_1,\ldots,\hat\mu_k$

Minimize $\sum_{k=1}^K \sum_{i=1}^n (y_i-\mu_k)^2$ over $R_k,\mu_k$ for $k\in \{1,\ldots,K\}$

. . .

This sounds more complicated than it is.

The minimization is performed __greedily__ (like forward stepwise regression).
* Trees involve stratifying or segmenting the predictor space into a number of simple regions.
* Trees are simple and useful for interpretation.
* Basic trees are not great at prediction.
* Modern methods that use trees are much better (Module 4)


## Example with mobility data

##
::: flex
::: w-50


![](https://www.aafp.org/dam/AAFP/images/journals/blogs/inpractice/covid_dx_algorithm4.png)



## Mobility data

```{r small-tree-prelim, echo=FALSE}
"Small" tree
```{r}
#| code-fold: true
#| fig-width: 8
data("mobility", package = "Stat406")
library(tree)
library(maptree)
mob <- mobility[complete.cases(mobility), ] %>% dplyr::select(-ID, -Name)
set.seed(12345)
par(mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0))
```

```{r}
#| fig-width: 8
bigtree <- tree(Mobility ~ ., data = mob)
smalltree <- prune.tree(bigtree, k = .09)
draw.tree(smalltree, digits = 2)
```
:::

This is called the [dendrogram]{.secondary}
::: w-50
"Big" tree
```{r big-tree, echo=FALSE}
#| fig-width: 8
#| fig-height: 5
draw.tree(bigtree, digits = 2)
```
:::
:::

[Terminology]{.secondary}

* We call each split or end point a *node*.
* Each terminal node is referred to as a *leaf*.

## Partition view
## Example with mobility data

```{r partition-view}
#| fig-width: 8
#| code-fold: true
#| fig-width: 10
mob$preds <- predict(smalltree)
par(mfrow = c(1, 2), mar = c(5, 3, 0, 0))
draw.tree(smalltree, digits = 2)
Expand All @@ -178,24 +190,97 @@ partition.tree(smalltree, add = TRUE, ordvars = c("Black", "Commute"))
```


We predict all observations in a region with the same value.
$\bullet$ The three regions correspond to the leaves of the tree.
[(The three regions correspond to the leaves of the tree.)]{.small}
\

* Trees are *piecewise constant functions*.\
[We predict all observations in a region with the same value.]{.small}
* Prediction regions are axis-parallel rectangles $R_1,\ldots,R_K$ based on $\X$

##

```{r big-tree}
#| fig-width: 8
#| fig-height: 5
draw.tree(bigtree, digits = 2)
```

<!-- ## -->

[Terminology]{.secondary}

We call each split or end point a node. Each terminal node is referred to as a leaf.
<!-- ![](https://www.aafp.org/dam/AAFP/images/journals/blogs/inpractice/covid_dx_algorithm4.png) -->


<!-- ## Dendrogram view -->

<!-- ```{r} -->
<!-- #| code-fold: true -->
<!-- #| fig-width: 8 -->
<!-- data("mobility", package = "Stat406") -->
<!-- library(tree) -->
<!-- library(maptree) -->
<!-- mob <- mobility[complete.cases(mobility), ] %>% dplyr::select(-ID, -Name) -->
<!-- set.seed(12345) -->
<!-- par(mar = c(0, 0, 0, 0), oma = c(0, 0, 0, 0)) -->
<!-- smalltree <- prune.tree(bigtree, k = .09) -->
<!-- draw.tree(smalltree, digits = 2) -->
<!-- ``` -->

<!-- This is called the [dendrogram]{.secondary} -->


<!-- ## Partition view -->

<!-- ```{r partition-view} -->
<!-- #| code-fold: true -->
<!-- #| fig-width: 10 -->
<!-- mob$preds <- predict(smalltree) -->
<!-- par(mfrow = c(1, 2), mar = c(5, 3, 0, 0)) -->
<!-- draw.tree(smalltree, digits = 2) -->
<!-- cols <- viridisLite::viridis(20, direction = -1)[cut(log(mob$Mobility), 20)] -->
<!-- plot(mob$Black, mob$Commute, -->
<!-- pch = 19, cex = .4, bty = "n", las = 1, col = cols, -->
<!-- ylab = "Commute time", xlab = "% Black" -->
<!-- ) -->
<!-- partition.tree(smalltree, add = TRUE, ordvars = c("Black", "Commute")) -->
<!-- ``` -->

The interior nodes lead to branches.


## Constructing Trees

::: flex
::: w-60

Iterative algorithm:

* While ($\mathtt{depth} \ne \mathtt{max.depth}$):
* For each existing region $R_k$
* For a given *splitting variable* $j$ and *split value* $s$,
define
$$
\begin{align}
R_k^> &= \{x \in R_k : x^{(j)} > s\} \\
R_k^< &= \{x \in R_k : x^{(j)} > s\}
\end{align}
$$
* Choose $j$ and $s$
to minimize
$$|R_k^>| \cdot \widehat{Var}(R_k^>) + |R_k^<| \cdot \widehat{Var}(R_k^<)$$

:::

::: w-35
```{r echo=FALSE}
#| fig-width: 5
#| fig-height: 4
plot(mob$Black, mob$Commute,
pch = 19, cex = .4, bty = "n", las = 1, col = cols,
ylab = "Commute time", xlab = "% Black"
)
partition.tree(smalltree, add = TRUE, ordvars = c("Black", "Commute"))
```
::: fragment
This algorithm is *greedy*, so it doesn't find the optimal tree\
[(But it works well!)]{.small}

:::
:::
:::


## Advantages and disadvantages of trees
Expand All @@ -206,11 +291,13 @@ The interior nodes lead to branches.

πŸŽ‰ Trees can easily be displayed graphically no matter the dimension of the data.

πŸŽ‰ Trees can easily handle qualitative predictors without the need to create dummy variables.
πŸŽ‰ Trees can easily handle categorical predictors without the need to create one-hot encodings.

πŸŽ‰ *Trees are GREAT for missing data!!!*

πŸ’© Trees aren't very good at prediction.

πŸ’© Full trees badly overfit, so we "prune" them using CV
πŸ’© Big trees badly overfit, so we "prune" them using CV

. . .

Expand Down

0 comments on commit 16b7d95

Please sign in to comment.