Skip to content

Commit

Permalink
rebuild bagging
Browse files Browse the repository at this point in the history
  • Loading branch information
dajmcdon committed Oct 30, 2023
1 parent 42aa8f2 commit 38b2cee
Show file tree
Hide file tree
Showing 12 changed files with 3,185 additions and 175 deletions.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

Large diffs are not rendered by default.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 34 additions & 2 deletions schedule/slides/17-nonlinear-classifiers.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -346,14 +346,46 @@ err <- map_dbl(1:kmax, ~ mean(knn.cv(dat1[, -1], dat1$y, k = .x) != dat1$y))
```{r}
#| echo: false
ggplot(data.frame(k = 1:kmax, error = err), aes(k, error)) +
geom_point(color = red) +
geom_line(color = red)
geom_point(color = orange) +
geom_line(color = orange)
```

I would use the _largest_ (odd) `k` that is close to the minimum.
This produces simpler, smoother, decision boundaries.


## Alternative (using deviance loss, I think this is right)

```{r}
#| code-fold: true
dev <- function(y, prob, prob_min = 1e-5) {
y <- as.numeric(as.factor(y)) - 1 # 0/1 valued
m <- mean(y)
prob_max <- 1 - prob_min
prob <- pmin(pmax(prob, prob_min), prob_max)
lp <- (1 - y) * log(1 - prob) + y * log(prob)
ly <- (1 - y) * log(1 - m) + y * log(m)
2 * (ly - lp)
}
knn.cv_probs <- function(train, cl, k = 1) {
o <- knn.cv(train, cl, k = k, prob = TRUE)
p <- attr(o, "prob")
o <- as.numeric(as.factor(o)) - 1
p[o == 0] <- 1 - p[o == 0]
p
}
dev_err <- map_dbl(1:kmax, ~ mean(dev(dat1$y, knn.cv_probs(dat1[, -1], dat1$y, k = .x))))
```

```{r}
#| echo: false
ggplot(data.frame(k = 1:kmax, error = dev_err), aes(k, error)) +
geom_point(color = orange) +
geom_line(color = orange)
```




## Final version

Expand Down
20 changes: 18 additions & 2 deletions schedule/slides/18-the-bootstrap.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -92,10 +92,26 @@ Can I get a 95% confidence interval for $Pr(y_0=1 \given x_0)$?
$\hat{F}$ is the "empirical" distribution of the bootstraps.


## Empirical distribution

```{r}
#| code-fold: true
r <- rexp(50, 1 / 5)
ggplot(tibble(r = r), aes(r)) +
stat_ecdf(colour = orange) +
geom_vline(xintercept = quantile(r, probs = c(.05, .95))) +
geom_hline(yintercept = c(.05, .95), linetype = "dashed") +
annotate(
"label", x = c(5, 12), y = c(.25, .75),
label = c("hat(F)[boot](.05)", "hat(F)[boot](.95)"),
parse = TRUE
)
```


## Very basic example

* Let $X_i\sim Exponential(1/5)$. The pdf is $f(x) = \frac{1}{5}e^{-x/5}$
* Let $X_i\sim \textrm{Exponential}(1/5)$. The pdf is $f(x) = \frac{1}{5}e^{-x/5}$


* I know if I estimate the mean with $\bar{X}$, then by the CLT (if $n$ is big),
Expand Down Expand Up @@ -368,7 +384,7 @@ where $\theta^*_q$ is the $q$ quantile of $\hat{\Theta}$.
Let $\hat{\theta}$ be our sample statistic, $\hat{\Theta}$ be the resamples

$$
[\hat{\theta} - t_{\alpha/2}\hat{s},\ \hat{\theta} - t_{\alpha/2}\hat{s}]
[\hat{\theta} - t_{\alpha/2}s,\ \hat{\theta} - t_{\alpha/2}s]
$$

where $\hat{s} = \sqrt{\Var{\hat{\Theta}}}$
Expand Down

0 comments on commit 38b2cee

Please sign in to comment.