diff --git a/.nojekyll b/.nojekyll
index 556072d..6026aa1 100644
--- a/.nojekyll
+++ b/.nojekyll
@@ -1 +1 @@
-19c57bc7
\ No newline at end of file
+a9c76953
\ No newline at end of file
diff --git a/schedule/slides/14-classification-intro.html b/schedule/slides/14-classification-intro.html
index 2fb0125..2c32215 100644
--- a/schedule/slides/14-classification-intro.html
+++ b/schedule/slides/14-classification-intro.html
@@ -398,7 +398,7 @@
-The Set-up
+
+Setup
It begins just like regression: suppose we have observations \[\{(x_1,y_1),\ldots,(x_n,y_n)\}\]
Again, we want to estimate a function that maps \(X\) to \(Y\) to predict as yet observed data.
(This function is known as a classifier )
@@ -450,10 +456,10 @@ The Set-up
How do we measure quality?
-Before in regression, we have \(y_i \in \mathbb{R}\) and use squared error loss to measure accuracy: \((y - \hat{y})^2\) .
+Before in regression, we have \(y_i \in \mathbb{R}\) and use \((y - \hat{y})^2\) loss to measure accuracy.
Instead, let \(y \in \mathcal{K} = \{1,\ldots, K\}\)
(This is arbitrary, sometimes other numbers, such as \(\{-1,1\}\) will be used)
-We can always take “factors”: \(\{\textrm{cat},\textrm{dog}\}\) and convert to integers, which is what we assume.
+We will usually convert categories/“factors” (e.g. \(\{\textrm{cat},\textrm{dog}\}\) ) to integers.
We again make predictions \(\hat{y}=k\) based on the data
We get zero loss if we predict the right class
@@ -462,7 +468,56 @@ How do we measure quality?
How do we measure quality?
-Suppose you have a fever of 39º C. You get a rapid test on campus.
+Example: You’re trying to build a fun widget to classify images of cats and dogs.
+
+
+
+
+
+
+Actual Dog
+0
+?
+
+
+Actual Cat
+?
+0
+
+
+
+
+
Use the zero-one loss (1 if wrong, 0 if right). Type of error doesn’t matter.
+
+
+
+
+
+
+Actual Dog
+0
+1
+
+
+Actual Cat
+1
+0
+
+
+
+
+
+
+How do we measure quality?
+Example: Suppose you have a fever of 39º C. You get a rapid test on campus.
Are +
0
-Infect others
+? (Infect others)
Are -
-Isolation
+? (Isolation)
0
-
-
-How do we measure quality?
-Suppose you have a fever of 39º C. You get a rapid test on campus.
+
+
Use a weighted loss; type of error matters!
Are +
0
-1
+(LARGE)
Are -
@@ -509,34 +562,26 @@ How do we measure quality?
+
Note that one class is “important”: we sometimes call that one positive . Errors are false positive and false negative .
+
In practice, you have to design your loss (just like before) to reflect what you care about.
+
How do we measure quality?
-
We’re going to use \(g(x)\) to be our classifier. It takes values in \(\mathcal{K}\) .
-
+Consider the risk \[R_n(g) = E [\ell(Y,g(X))]\] If we use the law of total probability, this can be written \[R_n(g) = E\left[\sum_{y=1}^K \ell(y,\; g(X)) Pr(Y = y \given X)\right]\] We minimize this over a class of options \(\mathcal{G}\) , to produce \[g_*(X) = \argmin_{g\in\mathcal{G}} E\left[\sum_{y=1}^K \ell(y,g(X)) Pr(Y = y \given X)\right]\]
How do we measure quality?
-Again, we appeal to risk \[R_n(g) = E [\ell(Y,g(X))]\] If we use the law of total probability, this can be written \[R_n(g) = E_X \sum_{y=1}^K \ell(y,\; g(X)) Pr(Y = y \given X)\] We minimize this over a class of options \(\mathcal{G}\) , to produce \[g_*(X) = \argmin_{g\in\mathcal{G}} E_X \sum_{y=1}^K \ell(y,g(X)) Pr(Y = y \given X)\]
-
-
-How do we measure quality?
\(g_*\) is named the Bayes’ classifier for loss \(\ell\) in class \(\mathcal{G}\) .
\(R_n(g_*)\) is the called the Bayes’ limit or Bayes’ Risk .
-It’s the best we could hope to do in terms of \(\ell\) if we knew the distribution of the data.
-
+
It’s the best we could hope to do even if we knew the distribution of the data (recall irreducible error!)
But we don’t, so we’ll try to do our best to estimate \(g_*\) .
-
Best classifier overall
-(for now, we limit to 2 classes)
-Once we make a specific choice for \(\ell\) , we can find \(g_*\) exactly (pretending we know the distribution)
-Because \(Y\) takes only a few values, zero-one loss is natural (but not the only option) \[\ell(y,\ g(x)) = \begin{cases}0 & y=g(x)\\1 & y\neq g(x) \end{cases} \Longrightarrow R_n(g) = \Expect{\ell(Y,\ g(X))} = Pr(g(X) \neq Y),\]
-
-
-Best classifier overall
+Suppose we actually know the distribution of everything, and we’ve picked \(\ell\) to be the zero-one loss
+\[\ell(y,\ g(x)) = \begin{cases}0 & y=g(x)\\1 & y\neq g(x) \end{cases}\]
+Then
+\[R_n(g) = \Expect{\ell(Y,\ g(X))} = Pr(g(X) \neq Y)\]
-
+
Best classifier overall
-This means we want to classify a new observation \((x_0,y_0)\) such that \(g(x_0) = y_0\) as often as possible
-Under this loss, we have \[
+Want to classify a new observation \((X,Y)\) such that \(g(X) = Y\) with as high probability as possible. Under zero-one loss, we have
+\[g_* = \argmin_{g} Pr(g(X) \neq Y) = \argmin_g 1- \Pr(g(X) = Y) = \argmax_g \Pr(g(X) = Y)\]
+
+
\[
\begin{aligned}
-g_*(X) &= \argmin_{g} Pr(g(X) \neq Y) \\
-&= \argmin_{g} \left[ 1 - Pr(Y = g(x) | X=x)\right] \\
-&= \argmax_{g} Pr(Y = g(x) | X=x )
+g_* &= \argmax_{g} E[\Pr(g(X) = Y | X)]\\
+&= \argmax_{g} E\left[\sum_{k\in\mathcal{K}}1[g(X) = k]\Pr(Y=k | X)\right]
\end{aligned}
\]
+
+
+
For each \(x\) , only one \(k\) can satisfy \(g(x) = k\) . So for each \(x\) ,
+
\[
+g_*(x) = \argmax_{k\in\mathcal{K}} \Pr(Y = k | X = x).
+\]
+
-
-Estimating \(g_*\)
-Classifier approach 1 (empirical risk minimization):
+
+Estimating \(g_*\) Approach 1: Empirical risk minimization
Choose some class of classifiers \(\mathcal{G}\) .
Find \(\argmin_{g\in\mathcal{G}} \sum_{i = 1}^n I(g(x_i) \neq y_i)\)
-
-Bayes’ Classifier and class densities (2 classes)
-Using Bayes’ theorem , and recalling that \(f_*(X) = E[Y \given X]\)
+
+Estimating \(g_*\) Approach 2: Class densities
+Consider 2 classes \(\{0,1\}\) : using Bayes’ theorem (and being loose with notation),
\[\begin{aligned}
-f_*(X) & = E[Y \given X] = Pr(Y = 1 \given X) \\
-&= \frac{Pr(X\given Y=1) Pr(Y=1)}{Pr(X)}\\
-& =\frac{Pr(X\given Y = 1) Pr(Y = 1)}{\sum_{k \in \{0,1\}} Pr(X\given Y = k) Pr(Y = k)} \\ & = \frac{p_1(X) \pi}{ p_1(X)\pi + p_0(X)(1-\pi)}\end{aligned}\]
+\Pr(Y=1 \given X=x) &= \frac{\Pr(X=x\given Y=1) \Pr(Y=1)}{\Pr(X=x)}\\
+&=\frac{\Pr(X=x\given Y = 1) \Pr(Y = 1)}{\sum_{k \in \{0,1\}} \Pr(X=x\given Y = k) \Pr(Y = k)} \\
+&= \frac{p_1(x) \pi}{ p_1(x)\pi + p_0(x)(1-\pi)}\end{aligned}\]
-We call \(p_k(X)\) the class (conditional) densities
+We call \(p_k(x)\) the class (conditional) densities
\(\pi\) is the marginal probability \(P(Y=1)\)
+Similar formula for \(\Pr(Y=0\given X=x) = p_0(x)(1-\pi)/(\dots)\)
-
-Bayes’ Classifier and class densities (2 classes)
-The Bayes’ Classifier (best classifier for 0-1 loss) can be rewritten
+
+Estimating \(g_*\) Approach 2: Class densities
+Recall \(g_*(x) = \argmax_k \Pr(Y=k|x)\) ; so we classify 1 if
+\[\frac{p_1(x) \pi}{ p_1(x)\pi + p_0(x)(1-\pi)} > \frac{p_0(x) (1-\pi)}{ p_1(x)\pi + p_0(x)(1-\pi)}\]
+i.e., the Bayes’ Classifier (best classifier for 0-1 loss) can be rewritten
\[g_*(X) = \begin{cases}
1 & \textrm{ if } \frac{p_1(X)}{p_0(X)} > \frac{1-\pi}{\pi} \\
0 & \textrm{ otherwise}
\end{cases}\]
-Approach 2: estimate everything in the expression above.
+Estimate everything in the expression above.
-We need to estimate \(p_1\) , \(p_2\) , \(\pi\) , \(1-\pi\)
+We need to estimate \(p_0\) , \(p_1\) , \(\pi\) , \(1-\pi\)
Easily extended to more than two classes
-
-An alternative easy classifier
-Zero-One loss was natural, but try something else
-Let’s try using squared error loss instead: \(\ell(y,\ f(x)) = (y - f(x))^2\)
-Then, the Bayes’ Classifier (the function that minimizes the Bayes Risk) is \[g_*(x) = f_*(x) = E[ Y \given X = x] = Pr(Y = 1 \given X)\] (recall that \(f_* \in [0,1]\) is still the regression function)
-In this case, our “class” will actually just be a probability. But this isn’t a class, so it’s a bit unsatisfying.
-How do we get a class prediction?
+
+Estimating \(g_*\) Approach 3: Regression discretization
+0-1 loss natural, but discrete. Let’s try using squared error : \(\ell(y,\ f(x)) = (y - f(x))^2\)
+What will be the optimal classifier here? (hint: think about regression)
-
Discretize the probability:
-
\[g(x) = \begin{cases}0 & f_*(x) < 1/2\\1 & \textrm{else}\end{cases}\]
+
The “Bayes’ Classifier” (sort of…minimizes risk) is just the regression function! \[f_*(x) = \Pr(Y = 1 \given X=x) = E[ Y \given X = x] \]
+
In this case, \(0\leq f_*(x)\leq 1\) not discrete… How do we get a class prediction?
-
-
-Estimating \(g_*\)
-Approach 3:
+
+
Discretize the output :
+
\[g(x) = \begin{cases}0 & f_*(x) < 1/2\\1 & \textrm{else}\end{cases}\]
-Estimate \(f_*\) using any method we’ve learned so far.
+Estimate \(\hat f(x) = E[Y|X=x] = \Pr(Y=1|X=x)\) using any method we’ve learned so far.
Predict 0 if \(\hat{f}(x)\) is less than 1/2, else predict 1.
+
Claim: Classification is easier than regression
@@ -686,30 +740,29 @@ Claim: Classification is easier than regression
How to find a classifier
-Why did we go through that math?
-Each of these approaches suggests a way to find a classifier
+Why did we go through that math?
+Each of these approaches has strengths/drawbacks:
-Empirical risk minimization: Choose a set of classifiers \(\mathcal{G}\) and find \(g \in \mathcal{G}\) that minimizes some estimate of \(R_n(g)\)
+Empirical risk minimization: Minimize \(R_n(g)\) in some family \(\mathcal{G}\)
(This can be quite challenging as, unlike in regression, the training error is nonconvex)
-
-
-
-Easiest classifier when \(y\in \{0,\ 1\}\) :
-(stupidest version of the third case…)
-
-
ghat <- round (predict (lm (y ~ ., data = trainingdata)))
-
-Think about why this may not be very good. (At least 2 reasons I can think of.)
+
+(We have to estimate class densities to classify. Too roundabout?)
+
+
+Regression: Find an estimate \(\hat{f}\approx E[Y|X=x]\) and compare the predicted value to 1/2
+
+
+(Unnatural, estimates whole regression function when we’ll just discretize anyway)
+
-Next time:
+Next time…
Estimating the densities
diff --git a/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-1-1.svg b/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-1-1.svg
index b9af09d..8f20477 100644
--- a/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-1-1.svg
+++ b/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-1-1.svg
@@ -1,369 +1,371 @@
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-2-1.svg b/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-2-1.svg
index 72a1074..77852f1 100644
--- a/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-2-1.svg
+++ b/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-2-1.svg
@@ -1,375 +1,377 @@
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-3-1.svg b/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-3-1.svg
index 7f4e494..71c92ea 100644
--- a/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-3-1.svg
+++ b/schedule/slides/14-classification-intro_files/figure-revealjs/unnamed-chunk-3-1.svg
@@ -1,699 +1,701 @@
-
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
+
+
-
-
-
-
-
+
+
+
+
+
+
+
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
-
-
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
diff --git a/search.json b/search.json
index 917021d..35aa502 100644
--- a/search.json
+++ b/search.json
@@ -1582,165 +1582,67 @@
"text": "Dumb example\n\n\\(K = 3\\)\n\n\nkm <- kmeans(clust_raw, 3, nstart = 20)\nnames(km)\n\n[1] \"cluster\" \"centers\" \"totss\" \"withinss\" \"tot.withinss\"\n[6] \"betweenss\" \"size\" \"iter\" \"ifault\" \n\ncenters <- as_tibble(km$centers, .name_repair = \"unique\")"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#section",
- "href": "schedule/slides/14-classification-intro.html#section",
- "title": "UBC Stat406 2024W",
- "section": "14 Classification",
- "text": "14 Classification\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 09 October 2023\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\]"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#an-overview-of-classification",
- "href": "schedule/slides/14-classification-intro.html#an-overview-of-classification",
- "title": "UBC Stat406 2024W",
- "section": "An Overview of Classification",
- "text": "An Overview of Classification\n\nA person arrives at an emergency room with a set of symptoms that could be 1 of 3 possible conditions. Which one is it?\nAn online banking service must be able to determine whether each transaction is fraudulent or not, using a customer’s location, past transaction history, etc.\nGiven a set of individuals sequenced DNA, can we determine whether various mutations are associated with different phenotypes?\n\n\nThese problems are not regression problems. They are classification problems."
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#the-set-up",
- "href": "schedule/slides/14-classification-intro.html#the-set-up",
- "title": "UBC Stat406 2024W",
- "section": "The Set-up",
- "text": "The Set-up\nIt begins just like regression: suppose we have observations \\[\\{(x_1,y_1),\\ldots,(x_n,y_n)\\}\\]\nAgain, we want to estimate a function that maps \\(X\\) to \\(Y\\) to predict as yet observed data.\n(This function is known as a classifier)\nThe same constraints apply:\n\nWe want a classifier that predicts test data, not just the training data.\nOften, this comes with the introduction of some bias to get lower variance and better predictions."
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality",
- "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality",
- "title": "UBC Stat406 2024W",
- "section": "How do we measure quality?",
- "text": "How do we measure quality?\nBefore in regression, we have \\(y_i \\in \\mathbb{R}\\) and use squared error loss to measure accuracy: \\((y - \\hat{y})^2\\).\nInstead, let \\(y \\in \\mathcal{K} = \\{1,\\ldots, K\\}\\)\n(This is arbitrary, sometimes other numbers, such as \\(\\{-1,1\\}\\) will be used)\nWe can always take “factors”: \\(\\{\\textrm{cat},\\textrm{dog}\\}\\) and convert to integers, which is what we assume.\nWe again make predictions \\(\\hat{y}=k\\) based on the data\n\nWe get zero loss if we predict the right class\nWe lose \\(\\ell(k,k')\\) on \\((k\\neq k')\\) for incorrect predictions"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-1",
- "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-1",
- "title": "UBC Stat406 2024W",
- "section": "How do we measure quality?",
- "text": "How do we measure quality?\nSuppose you have a fever of 39º C. You get a rapid test on campus.\n\n\n\nLoss\nTest +\nTest -\n\n\n\n\nAre +\n0\nInfect others\n\n\nAre -\nIsolation\n0"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-2",
- "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-2",
- "title": "UBC Stat406 2024W",
- "section": "How do we measure quality?",
- "text": "How do we measure quality?\nSuppose you have a fever of 39º C. You get a rapid test on campus.\n\n\n\nLoss\nTest +\nTest -\n\n\n\n\nAre +\n0\n1\n\n\nAre -\n1\n0"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-3",
- "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-3",
- "title": "UBC Stat406 2024W",
- "section": "How do we measure quality?",
- "text": "How do we measure quality?\n\nWe’re going to use \\(g(x)\\) to be our classifier. It takes values in \\(\\mathcal{K}\\)."
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-4",
- "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-4",
- "title": "UBC Stat406 2024W",
- "section": "How do we measure quality?",
- "text": "How do we measure quality?\nAgain, we appeal to risk \\[R_n(g) = E [\\ell(Y,g(X))]\\] If we use the law of total probability, this can be written \\[R_n(g) = E_X \\sum_{y=1}^K \\ell(y,\\; g(X)) Pr(Y = y \\given X)\\] We minimize this over a class of options \\(\\mathcal{G}\\), to produce \\[g_*(X) = \\argmin_{g\\in\\mathcal{G}} E_X \\sum_{y=1}^K \\ell(y,g(X)) Pr(Y = y \\given X)\\]"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-5",
- "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-5",
- "title": "UBC Stat406 2024W",
- "section": "How do we measure quality?",
- "text": "How do we measure quality?\n\\(g_*\\) is named the Bayes’ classifier for loss \\(\\ell\\) in class \\(\\mathcal{G}\\).\n\\(R_n(g_*)\\) is the called the Bayes’ limit or Bayes’ Risk.\nIt’s the best we could hope to do in terms of \\(\\ell\\) if we knew the distribution of the data.\n\nBut we don’t, so we’ll try to do our best to estimate \\(g_*\\)."
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#best-classifier-overall",
- "href": "schedule/slides/14-classification-intro.html#best-classifier-overall",
- "title": "UBC Stat406 2024W",
- "section": "Best classifier overall",
- "text": "Best classifier overall\n(for now, we limit to 2 classes)\nOnce we make a specific choice for \\(\\ell\\), we can find \\(g_*\\) exactly (pretending we know the distribution)\nBecause \\(Y\\) takes only a few values, zero-one loss is natural (but not the only option) \\[\\ell(y,\\ g(x)) = \\begin{cases}0 & y=g(x)\\\\1 & y\\neq g(x) \\end{cases} \\Longrightarrow R_n(g) = \\Expect{\\ell(Y,\\ g(X))} = Pr(g(X) \\neq Y),\\]"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#best-classifier-overall-1",
- "href": "schedule/slides/14-classification-intro.html#best-classifier-overall-1",
- "title": "UBC Stat406 2024W",
- "section": "Best classifier overall",
- "text": "Best classifier overall\n\n\n\nLoss\nTest +\nTest -\n\n\n\n\nAre +\n0\n1\n\n\nAre -\n1\n0"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#best-classifier-overall-2",
- "href": "schedule/slides/14-classification-intro.html#best-classifier-overall-2",
- "title": "UBC Stat406 2024W",
- "section": "Best classifier overall",
- "text": "Best classifier overall\nThis means we want to classify a new observation \\((x_0,y_0)\\) such that \\(g(x_0) = y_0\\) as often as possible\nUnder this loss, we have \\[\n\\begin{aligned}\ng_*(X) &= \\argmin_{g} Pr(g(X) \\neq Y) \\\\\n&= \\argmin_{g} \\left[ 1 - Pr(Y = g(x) | X=x)\\right] \\\\\n&= \\argmax_{g} Pr(Y = g(x) | X=x )\n\\end{aligned}\n\\]"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#estimating-g_",
- "href": "schedule/slides/14-classification-intro.html#estimating-g_",
- "title": "UBC Stat406 2024W",
- "section": "Estimating \\(g_*\\)",
- "text": "Estimating \\(g_*\\)\nClassifier approach 1 (empirical risk minimization):\n\nChoose some class of classifiers \\(\\mathcal{G}\\).\nFind \\(\\argmin_{g\\in\\mathcal{G}} \\sum_{i = 1}^n I(g(x_i) \\neq y_i)\\)"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#bayes-classifier-and-class-densities-2-classes",
- "href": "schedule/slides/14-classification-intro.html#bayes-classifier-and-class-densities-2-classes",
- "title": "UBC Stat406 2024W",
- "section": "Bayes’ Classifier and class densities (2 classes)",
- "text": "Bayes’ Classifier and class densities (2 classes)\nUsing Bayes’ theorem, and recalling that \\(f_*(X) = E[Y \\given X]\\)\n\\[\\begin{aligned}\nf_*(X) & = E[Y \\given X] = Pr(Y = 1 \\given X) \\\\\n&= \\frac{Pr(X\\given Y=1) Pr(Y=1)}{Pr(X)}\\\\\n& =\\frac{Pr(X\\given Y = 1) Pr(Y = 1)}{\\sum_{k \\in \\{0,1\\}} Pr(X\\given Y = k) Pr(Y = k)} \\\\ & = \\frac{p_1(X) \\pi}{ p_1(X)\\pi + p_0(X)(1-\\pi)}\\end{aligned}\\]\n\nWe call \\(p_k(X)\\) the class (conditional) densities\n\\(\\pi\\) is the marginal probability \\(P(Y=1)\\)"
- },
- {
- "objectID": "schedule/slides/14-classification-intro.html#bayes-classifier-and-class-densities-2-classes-1",
- "href": "schedule/slides/14-classification-intro.html#bayes-classifier-and-class-densities-2-classes-1",
+ "objectID": "schedule/slides/22-nnets-estimation.html#section",
+ "href": "schedule/slides/22-nnets-estimation.html#section",
"title": "UBC Stat406 2024W",
- "section": "Bayes’ Classifier and class densities (2 classes)",
- "text": "Bayes’ Classifier and class densities (2 classes)\nThe Bayes’ Classifier (best classifier for 0-1 loss) can be rewritten\n\\[g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}\\]\nApproach 2: estimate everything in the expression above.\n\nWe need to estimate \\(p_1\\), \\(p_2\\), \\(\\pi\\), \\(1-\\pi\\)\nEasily extended to more than two classes"
+ "section": "22 Neural nets - estimation",
+ "text": "22 Neural nets - estimation\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 16 November 2023\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n\\]"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#an-alternative-easy-classifier",
- "href": "schedule/slides/14-classification-intro.html#an-alternative-easy-classifier",
+ "objectID": "schedule/slides/22-nnets-estimation.html#neural-network-terms-again-t-hidden-layers-regression",
+ "href": "schedule/slides/22-nnets-estimation.html#neural-network-terms-again-t-hidden-layers-regression",
"title": "UBC Stat406 2024W",
- "section": "An alternative easy classifier",
- "text": "An alternative easy classifier\nZero-One loss was natural, but try something else\nLet’s try using squared error loss instead: \\(\\ell(y,\\ f(x)) = (y - f(x))^2\\)\nThen, the Bayes’ Classifier (the function that minimizes the Bayes Risk) is \\[g_*(x) = f_*(x) = E[ Y \\given X = x] = Pr(Y = 1 \\given X)\\] (recall that \\(f_* \\in [0,1]\\) is still the regression function)\nIn this case, our “class” will actually just be a probability. But this isn’t a class, so it’s a bit unsatisfying.\nHow do we get a class prediction?\n\nDiscretize the probability:\n\\[g(x) = \\begin{cases}0 & f_*(x) < 1/2\\\\1 & \\textrm{else}\\end{cases}\\]"
+ "section": "Neural Network terms again (T hidden layers, regression)",
+ "text": "Neural Network terms again (T hidden layers, regression)\n\n\n\\[\n\\begin{aligned}\nA_{k}^{(1)} &= g\\left(\\sum_{j=1}^p w^{(1)}_{k,j} x_j\\right)\\\\\nA_{\\ell}^{(t)} &= g\\left(\\sum_{k=1}^{K_{t-1}} w^{(t)}_{\\ell,k} A_{k}^{(t-1)} \\right)\\\\\n\\hat{Y} &= z_m = \\sum_{\\ell=1}^{K_T} \\beta_{m,\\ell} A_{\\ell}^{(T)}\\ \\ (M = 1)\n\\end{aligned}\n\\]\n\n\\(B \\in \\R^{M\\times K_T}\\).\n\\(M=1\\) for regression\n\n\\(\\mathbf{W}_t \\in \\R^{K_2\\times K_1}\\) \\(t=1,\\ldots,T\\)"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#estimating-g_-1",
- "href": "schedule/slides/14-classification-intro.html#estimating-g_-1",
+ "objectID": "schedule/slides/22-nnets-estimation.html#training-neural-networks.-first-choices",
+ "href": "schedule/slides/22-nnets-estimation.html#training-neural-networks.-first-choices",
"title": "UBC Stat406 2024W",
- "section": "Estimating \\(g_*\\)",
- "text": "Estimating \\(g_*\\)\nApproach 3:\n\nEstimate \\(f_*\\) using any method we’ve learned so far.\nPredict 0 if \\(\\hat{f}(x)\\) is less than 1/2, else predict 1."
+ "section": "Training neural networks. First, choices",
+ "text": "Training neural networks. First, choices\n\nChoose the architecture: how many layers, units per layer, what connections?\nChoose the loss: common choices (for each data point \\(i\\))\n\n\nRegression\n\n\\(\\hat{R}_i = \\frac{1}{2}(y_i - \\hat{y}_i)^2\\) (the 1/2 just makes the derivative nice)\n\nClassification\n\n\\(\\hat{R}_i = I(y_i = m)\\log( 1 + \\exp(-z_{im}))\\)\n\n\n\nChoose the activation function \\(g\\)"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression",
- "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression",
+ "objectID": "schedule/slides/22-nnets-estimation.html#training-neural-networks-intuition",
+ "href": "schedule/slides/22-nnets-estimation.html#training-neural-networks-intuition",
"title": "UBC Stat406 2024W",
- "section": "Claim: Classification is easier than regression",
- "text": "Claim: Classification is easier than regression\n\nLet \\(\\hat{f}\\) be any estimate of \\(f_*\\)\nLet \\(\\widehat{g} (x) = \\begin{cases}0 & \\hat f(x) < 1/2\\\\1 & else\\end{cases}\\)\n\nProof by picture."
+ "section": "Training neural networks (intuition)",
+ "text": "Training neural networks (intuition)\n\nWe need to estimate \\(B\\), \\(\\mathbf{W}_t\\), \\(t=1,\\ldots,T\\)\nWe want to minimize \\(\\hat{R} = \\sum_{i=1}^n \\hat{R}_i\\) as a function of all this.\nWe use gradient descent, but in this dialect, we call it back propagation\n\n\n\nDerivatives via the chain rule: computed by a forward and backward sweep\nAll the \\(g(u)\\)’s that get used have \\(g'(u)\\) “nice”.\nIf \\(g\\) is ReLu:\n\n\\(g(u) = xI(x>0)\\)\n\\(g'(u) = I(x>0)\\)\n\n\n\nOnce we have derivatives from backprop,\n\\[\n\\begin{align}\n\\widetilde{B} &\\leftarrow B - \\gamma \\frac{\\partial \\widehat{R}}{\\partial B}\\\\\n\\widetilde{\\mathbf{W}_t} &\\leftarrow \\mathbf{W}_t - \\gamma \\frac{\\partial \\widehat{R}}{\\partial \\mathbf{W}_t}\n\\end{align}\n\\]"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-1",
- "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-1",
+ "objectID": "schedule/slides/22-nnets-estimation.html#chain-rule",
+ "href": "schedule/slides/22-nnets-estimation.html#chain-rule",
"title": "UBC Stat406 2024W",
- "section": "Claim: Classification is easier than regression",
- "text": "Claim: Classification is easier than regression\n\n\nCode\nset.seed(12345)\nx <- 1:99 / 100\ny <- rbinom(99, 1, \n .25 + .5 * (x > .3 & x < .5) + \n .6 * (x > .7))\ndmat <- as.matrix(dist(x))\nksm <- function(sigma) {\n gg <- dnorm(dmat, sd = sigma) \n sweep(gg, 1, rowSums(gg), '/') %*% y\n}\nfstar <- ksm(.04)\ngg <- tibble(x = x, fstar = fstar, y = y) %>%\n ggplot(aes(x)) +\n geom_point(aes(y = y), color = blue) +\n geom_line(aes(y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5)\ngg"
+ "section": "Chain rule",
+ "text": "Chain rule\nWe want \\(\\frac{\\partial}{\\partial B} \\hat{R}_i\\) and \\(\\frac{\\partial}{\\partial W_{t}}\\hat{R}_i\\) for all \\(t\\).\nRegression: \\(\\hat{R}_i = \\frac{1}{2}(y_i - \\hat{y}_i)^2\\)\n\\[\\begin{aligned}\n\\frac{\\partial\\hat{R}_i}{\\partial B} &= -(y_i - \\hat{y}_i)\\frac{\\partial \\hat{y_i}}{\\partial B} =\\underbrace{-(y_i - \\hat{y}_i)}_{-r_i} \\mathbf{A}^{(T)}\\\\\n\\frac{\\partial}{\\partial \\mathbf{W}_T} \\hat{R}_i &= -(y_i - \\hat{y}_i)\\frac{\\partial\\hat{y_i}}{\\partial \\mathbf{W}_T} = -r_i \\frac{\\partial \\hat{y}_i}{\\partial \\mathbf{A}^{(T)}} \\frac{\\partial \\mathbf{A}^{(T)}}{\\partial \\mathbf{W}_T}\\\\\n&= -\\left(r_i B \\odot g'(\\mathbf{W}_T \\mathbf{A}^{(T)}) \\right) \\left(\\mathbf{A}^{(T-1)}\\right)^\\top\\\\\n\\frac{\\partial}{\\partial \\mathbf{W}_{T-1}} \\hat{R}_i &= -(y_i - \\hat{y}_i)\\frac{\\partial\\hat{y_i}}{\\partial \\mathbf{W}_{T-1}} = -r_i \\frac{\\partial \\hat{y}_i}{\\partial \\mathbf{A}^{(T)}} \\frac{\\partial \\mathbf{A}^{(T)}}{\\partial \\mathbf{W}_{T-1}}\\\\\n&= -r_i \\frac{\\partial \\hat{y}_i}{\\partial \\mathbf{A}^{(T)}} \\frac{\\partial \\mathbf{A}^{(T)}}{\\partial \\mathbf{W}_{T}}\\frac{\\partial \\mathbf{W}_{T}}{\\partial \\mathbf{A}^{(T-1)}}\\frac{\\partial \\mathbf{A}^{(T-1)}}{\\partial \\mathbf{W}_{T-1}}\\\\\n\\cdots &= \\cdots\n\\end{aligned}\\]"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-2",
- "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-2",
+ "objectID": "schedule/slides/22-nnets-estimation.html#mapping-it-out",
+ "href": "schedule/slides/22-nnets-estimation.html#mapping-it-out",
"title": "UBC Stat406 2024W",
- "section": "Claim: Classification is easier than regression",
- "text": "Claim: Classification is easier than regression\n\n\nCode\ngg + geom_hline(yintercept = .5, color = green)"
+ "section": "Mapping it out",
+ "text": "Mapping it out\nGiven current \\(\\mathbf{W}_t, B\\), we want to get new, \\(\\widetilde{\\mathbf{W}}_t,\\ \\widetilde B\\) for \\(t=1,\\ldots,T\\)\n\nSquared error for regression, cross-entropy for classification\n\n\n\nFeed forward \n\\[\\mathbf{A}^{(0)} = \\mathbf{X} \\in \\R^{n\\times p}\\]\nRepeat, \\(t= 1,\\ldots, T\\)\n\n\\(\\mathbf{Z}_{t} = \\mathbf{A}^{(t-1)}\\mathbf{W}_t \\in \\R^{n\\times K_t}\\)\n\\(\\mathbf{A}^{(t)} = g(\\mathbf{Z}_{t})\\) (component wise)\n\\(\\dot{\\mathbf{A}}^{(t)} = g'(\\mathbf{Z}_t)\\)\n\n\\[\\begin{cases}\n\\hat{\\mathbf{y}} =\\mathbf{A}^{(T)} B \\in \\R^n \\\\\n\\hat{\\Pi} = \\left(1 + \\exp\\left(-\\mathbf{A}^{(T)}\\mathbf{B}\\right)\\right)^{-1} \\in \\R^{n \\times M}\\end{cases}\\]\n\n\nBack propogate \n\\[-r = \\begin{cases}\n-\\left(\\mathbf{y} - \\widehat{\\mathbf{y}}\\right) \\\\\n-\\left(1 - \\widehat{\\Pi}\\right)[y]\\end{cases}\\]\n\\[\n\\begin{aligned}\n\\frac{\\partial}{\\partial \\mathbf{B}} \\widehat{R} &= \\left(\\mathbf{A}^{(T)}\\right)^\\top \\mathbf{r}\\\\\n-\\boldsymbol{\\Gamma} &\\leftarrow -\\mathbf{r}\\\\\n\\mathbf{W}_{T+1} &\\leftarrow \\mathbf{B}\n\\end{aligned}\n\\]\nRepeat, \\(t = T,...,1\\),\n\n\\(-\\boldsymbol{\\Gamma} \\leftarrow -\\left(\\boldsymbol{\\Gamma} \\mathbf{W}_{t+1}\\right) \\odot\\dot{\\mathbf{A}}^{(t)}\\)\n\\(\\frac{\\partial R}{\\partial \\mathbf{W}_t} = -\\left(\\mathbf{A}^{(t)}\\right)^\\top \\Gamma\\)"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-3",
- "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-3",
+ "objectID": "schedule/slides/22-nnets-estimation.html#deep-nets",
+ "href": "schedule/slides/22-nnets-estimation.html#deep-nets",
"title": "UBC Stat406 2024W",
- "section": "Claim: Classification is easier than regression",
- "text": "Claim: Classification is easier than regression\n\n\nCode\ntib <- tibble(x = x, fstar = fstar, y = y)\nggplot(tib) +\n geom_vline(data = filter(tib, fstar > 0.5), aes(xintercept = x), alpha = .5, color = green) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5) + \n geom_point(aes(x = x, y = y), color = blue) +\n geom_line(aes(x = x, y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1))"
+ "section": "Deep nets",
+ "text": "Deep nets\nSome comments on adding layers:\n\nIt has been shown that one hidden layer is sufficient to approximate any bounded piecewise continuous function\nHowever, this may take a huge number of hidden units (i.e. \\(K_1 \\gg 1\\)).\nThis is what people mean when they say that NNets are “universal approximators”\nBy including multiple layers, we can have fewer hidden units per layer.\nAlso, we can encode (in)dependencies that can speed computations\nWe don’t have to connect everything the way we have been"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#how-to-find-a-classifier",
- "href": "schedule/slides/14-classification-intro.html#how-to-find-a-classifier",
+ "objectID": "schedule/slides/22-nnets-estimation.html#simple-example",
+ "href": "schedule/slides/22-nnets-estimation.html#simple-example",
"title": "UBC Stat406 2024W",
- "section": "How to find a classifier",
- "text": "How to find a classifier\nWhy did we go through that math?\nEach of these approaches suggests a way to find a classifier\n\nEmpirical risk minimization: Choose a set of classifiers \\(\\mathcal{G}\\) and find \\(g \\in \\mathcal{G}\\) that minimizes some estimate of \\(R_n(g)\\)\n\n\n(This can be quite challenging as, unlike in regression, the training error is nonconvex)\n\n\nDensity estimation: Estimate \\(\\pi\\) and \\(p_k\\)\nRegression: Find an estimate \\(\\hat{f}\\) of \\(f^*\\) and compare the predicted value to 1/2"
+ "section": "Simple example",
+ "text": "Simple example\n\nn <- 200\ndf <- tibble(\n x = seq(.05, 1, length = n),\n y = sin(1 / x) + rnorm(n, 0, .1) # Doppler function\n)\ntestdata <- matrix(seq(.05, 1, length.out = 1e3), ncol = 1)\nlibrary(neuralnet)\nnn_out <- neuralnet(y ~ x, data = df, hidden = c(10, 5, 15), threshold = 0.01, rep = 3)\nnn_preds <- map(1:3, ~ compute(nn_out, testdata, .x)$net.result)\nyhat <- nn_preds |> bind_cols() |> rowMeans() # average over the runs\n\n\n\nCode\n# This code will reproduce the analysis, takes some time\nset.seed(406406406)\nn <- 200\ndf <- tibble(\n x = seq(.05, 1, length = n),\n y = sin(1 / x) + rnorm(n, 0, .1) # Doppler function\n)\ntestx <- matrix(seq(.05, 1, length.out = 1e3), ncol = 1)\nlibrary(neuralnet)\nlibrary(splines)\nfstar <- sin(1 / testx)\nspline_test_err <- function(k) {\n fit <- lm(y ~ bs(x, df = k), data = df)\n yhat <- predict(fit, newdata = tibble(x = testx))\n mean((yhat - fstar)^2)\n}\nKs <- 1:15 * 10\nSplineErr <- map_dbl(Ks, ~ spline_test_err(.x))\n\nJgrid <- c(5, 10, 15)\nNNerr <- double(length(Jgrid)^3)\nNNplot <- character(length(Jgrid)^3)\nsweep <- 0\nfor (J1 in Jgrid) {\n for (J2 in Jgrid) {\n for (J3 in Jgrid) {\n sweep <- sweep + 1\n NNplot[sweep] <- paste(J1, J2, J3, sep = \" \")\n nn_out <- neuralnet(y ~ x, df,\n hidden = c(J1, J2, J3),\n threshold = 0.01, rep = 3\n )\n nn_results <- sapply(1:3, function(x) {\n compute(nn_out, testx, x)$net.result\n })\n # Run them through the neural network\n Yhat <- rowMeans(nn_results)\n NNerr[sweep] <- mean((Yhat - fstar)^2)\n }\n }\n}\n\nbestK <- Ks[which.min(SplineErr)]\nbestspline <- predict(lm(y ~ bs(x, bestK), data = df), newdata = tibble(x = testx))\nbesthidden <- as.numeric(unlist(strsplit(NNplot[which.min(NNerr)], \" \")))\nnn_out <- neuralnet(y ~ x, df, hidden = besthidden, threshold = 0.01, rep = 3)\nnn_results <- sapply(1:3, function(x) compute(nn_out, testdata, x)$net.result)\n# Run them through the neural network\nbestnn <- rowMeans(nn_results)\nplotd <- data.frame(\n x = testdata, spline = bestspline, nnet = bestnn, truth = fstar\n)\nsave.image(file = \"data/nnet-example.Rdata\")"
},
{
- "objectID": "schedule/slides/14-classification-intro.html#section-1",
- "href": "schedule/slides/14-classification-intro.html#section-1",
+ "objectID": "schedule/slides/22-nnets-estimation.html#different-architectures",
+ "href": "schedule/slides/22-nnets-estimation.html#different-architectures",
"title": "UBC Stat406 2024W",
- "section": "",
- "text": "Easiest classifier when \\(y\\in \\{0,\\ 1\\}\\):\n(stupidest version of the third case…)\n\nghat <- round(predict(lm(y ~ ., data = trainingdata)))\n\nThink about why this may not be very good. (At least 2 reasons I can think of.)"
+ "section": "Different architectures",
+ "text": "Different architectures"
},
{
"objectID": "schedule/slides/04-bias-variance.html#section",
@@ -3430,67 +3332,137 @@
"text": "Out-of-bag error estimation for bagging / RF\nFor randomForest(), predict() without passing newdata = gives the OOB prediction\nnot like lm() where it gives the fitted values\n\ntab <- table(predict(bag), train$mobile) \nkbl(tab) |> add_header_above(c(\"Truth\" = 1, \"Bagging\" = 2))\n\n\n\n\n\n\n\n\n\n\nTruth\n\n\nBagging\n\n\n\n\nFALSE\nTRUE\n\n\n\n\nFALSE\n182\n28\n\n\nTRUE\n21\n82\n\n\n\n\n\n\n1 - sum(diag(tab)) / sum(tab) ## OOB misclassification error, no need for CV\n\n[1] 0.1565495"
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#section",
- "href": "schedule/slides/22-nnets-estimation.html#section",
+ "objectID": "schedule/slides/14-classification-intro.html#section",
+ "href": "schedule/slides/14-classification-intro.html#section",
"title": "UBC Stat406 2024W",
- "section": "22 Neural nets - estimation",
- "text": "22 Neural nets - estimation\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 16 November 2023\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n\\]"
+ "section": "14 Classification",
+ "text": "14 Classification\nStat 406\nGeoff Pleiss, Trevor Campbell\nLast modified – 14 October 2024\n\\[\n\\DeclareMathOperator*{\\argmin}{argmin}\n\\DeclareMathOperator*{\\argmax}{argmax}\n\\DeclareMathOperator*{\\minimize}{minimize}\n\\DeclareMathOperator*{\\maximize}{maximize}\n\\DeclareMathOperator*{\\find}{find}\n\\DeclareMathOperator{\\st}{subject\\,\\,to}\n\\newcommand{\\E}{E}\n\\newcommand{\\Expect}[1]{\\E\\left[ #1 \\right]}\n\\newcommand{\\Var}[1]{\\mathrm{Var}\\left[ #1 \\right]}\n\\newcommand{\\Cov}[2]{\\mathrm{Cov}\\left[#1,\\ #2\\right]}\n\\newcommand{\\given}{\\ \\vert\\ }\n\\newcommand{\\X}{\\mathbf{X}}\n\\newcommand{\\x}{\\mathbf{x}}\n\\newcommand{\\y}{\\mathbf{y}}\n\\newcommand{\\P}{\\mathcal{P}}\n\\newcommand{\\R}{\\mathbb{R}}\n\\newcommand{\\norm}[1]{\\left\\lVert #1 \\right\\rVert}\n\\newcommand{\\snorm}[1]{\\lVert #1 \\rVert}\n\\newcommand{\\tr}[1]{\\mbox{tr}(#1)}\n\\newcommand{\\brt}{\\widehat{\\beta}^R_{s}}\n\\newcommand{\\brl}{\\widehat{\\beta}^R_{\\lambda}}\n\\newcommand{\\bls}{\\widehat{\\beta}_{ols}}\n\\newcommand{\\blt}{\\widehat{\\beta}^L_{s}}\n\\newcommand{\\bll}{\\widehat{\\beta}^L_{\\lambda}}\n\\newcommand{\\U}{\\mathbf{U}}\n\\newcommand{\\D}{\\mathbf{D}}\n\\newcommand{\\V}{\\mathbf{V}}\n\\]"
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#neural-network-terms-again-t-hidden-layers-regression",
- "href": "schedule/slides/22-nnets-estimation.html#neural-network-terms-again-t-hidden-layers-regression",
+ "objectID": "schedule/slides/14-classification-intro.html#an-overview-of-classification",
+ "href": "schedule/slides/14-classification-intro.html#an-overview-of-classification",
"title": "UBC Stat406 2024W",
- "section": "Neural Network terms again (T hidden layers, regression)",
- "text": "Neural Network terms again (T hidden layers, regression)\n\n\n\\[\n\\begin{aligned}\nA_{k}^{(1)} &= g\\left(\\sum_{j=1}^p w^{(1)}_{k,j} x_j\\right)\\\\\nA_{\\ell}^{(t)} &= g\\left(\\sum_{k=1}^{K_{t-1}} w^{(t)}_{\\ell,k} A_{k}^{(t-1)} \\right)\\\\\n\\hat{Y} &= z_m = \\sum_{\\ell=1}^{K_T} \\beta_{m,\\ell} A_{\\ell}^{(T)}\\ \\ (M = 1)\n\\end{aligned}\n\\]\n\n\\(B \\in \\R^{M\\times K_T}\\).\n\\(M=1\\) for regression\n\n\\(\\mathbf{W}_t \\in \\R^{K_2\\times K_1}\\) \\(t=1,\\ldots,T\\)"
+ "section": "An Overview of Classification",
+ "text": "An Overview of Classification\n\nA person arrives at an emergency room with a set of symptoms that could be 1 of 3 possible conditions. Which one is it?\nAn online banking service must be able to determine whether each transaction is fraudulent or not, using a customer’s location, past transaction history, etc.\nGiven a set of individuals sequenced DNA, can we determine whether various mutations are associated with different phenotypes?\n\n\nThese problems are not regression problems. They are classification problems.\n\n\nClassification involves a categorical response variable (no notion of “order”/“distance”)."
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#training-neural-networks.-first-choices",
- "href": "schedule/slides/22-nnets-estimation.html#training-neural-networks.-first-choices",
+ "objectID": "schedule/slides/14-classification-intro.html#setup",
+ "href": "schedule/slides/14-classification-intro.html#setup",
"title": "UBC Stat406 2024W",
- "section": "Training neural networks. First, choices",
- "text": "Training neural networks. First, choices\n\nChoose the architecture: how many layers, units per layer, what connections?\nChoose the loss: common choices (for each data point \\(i\\))\n\n\nRegression\n\n\\(\\hat{R}_i = \\frac{1}{2}(y_i - \\hat{y}_i)^2\\) (the 1/2 just makes the derivative nice)\n\nClassification\n\n\\(\\hat{R}_i = I(y_i = m)\\log( 1 + \\exp(-z_{im}))\\)\n\n\n\nChoose the activation function \\(g\\)"
+ "section": "Setup",
+ "text": "Setup\nIt begins just like regression: suppose we have observations \\[\\{(x_1,y_1),\\ldots,(x_n,y_n)\\}\\]\nAgain, we want to estimate a function that maps \\(X\\) to \\(Y\\) to predict as yet observed data.\n(This function is known as a classifier)\nThe same constraints apply:\n\nWe want a classifier that predicts test data, not just the training data.\nOften, this comes with the introduction of some bias to get lower variance and better predictions."
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#training-neural-networks-intuition",
- "href": "schedule/slides/22-nnets-estimation.html#training-neural-networks-intuition",
+ "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality",
+ "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality",
"title": "UBC Stat406 2024W",
- "section": "Training neural networks (intuition)",
- "text": "Training neural networks (intuition)\n\nWe need to estimate \\(B\\), \\(\\mathbf{W}_t\\), \\(t=1,\\ldots,T\\)\nWe want to minimize \\(\\hat{R} = \\sum_{i=1}^n \\hat{R}_i\\) as a function of all this.\nWe use gradient descent, but in this dialect, we call it back propagation\n\n\n\nDerivatives via the chain rule: computed by a forward and backward sweep\nAll the \\(g(u)\\)’s that get used have \\(g'(u)\\) “nice”.\nIf \\(g\\) is ReLu:\n\n\\(g(u) = xI(x>0)\\)\n\\(g'(u) = I(x>0)\\)\n\n\n\nOnce we have derivatives from backprop,\n\\[\n\\begin{align}\n\\widetilde{B} &\\leftarrow B - \\gamma \\frac{\\partial \\widehat{R}}{\\partial B}\\\\\n\\widetilde{\\mathbf{W}_t} &\\leftarrow \\mathbf{W}_t - \\gamma \\frac{\\partial \\widehat{R}}{\\partial \\mathbf{W}_t}\n\\end{align}\n\\]"
+ "section": "How do we measure quality?",
+ "text": "How do we measure quality?\nBefore in regression, we have \\(y_i \\in \\mathbb{R}\\) and use \\((y - \\hat{y})^2\\) loss to measure accuracy.\nInstead, let \\(y \\in \\mathcal{K} = \\{1,\\ldots, K\\}\\)\n(This is arbitrary, sometimes other numbers, such as \\(\\{-1,1\\}\\) will be used)\nWe will usually convert categories/“factors” (e.g. \\(\\{\\textrm{cat},\\textrm{dog}\\}\\)) to integers.\nWe again make predictions \\(\\hat{y}=k\\) based on the data\n\nWe get zero loss if we predict the right class\nWe lose \\(\\ell(k,k')\\) on \\((k\\neq k')\\) for incorrect predictions"
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#chain-rule",
- "href": "schedule/slides/22-nnets-estimation.html#chain-rule",
+ "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-1",
+ "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-1",
"title": "UBC Stat406 2024W",
- "section": "Chain rule",
- "text": "Chain rule\nWe want \\(\\frac{\\partial}{\\partial B} \\hat{R}_i\\) and \\(\\frac{\\partial}{\\partial W_{t}}\\hat{R}_i\\) for all \\(t\\).\nRegression: \\(\\hat{R}_i = \\frac{1}{2}(y_i - \\hat{y}_i)^2\\)\n\\[\\begin{aligned}\n\\frac{\\partial\\hat{R}_i}{\\partial B} &= -(y_i - \\hat{y}_i)\\frac{\\partial \\hat{y_i}}{\\partial B} =\\underbrace{-(y_i - \\hat{y}_i)}_{-r_i} \\mathbf{A}^{(T)}\\\\\n\\frac{\\partial}{\\partial \\mathbf{W}_T} \\hat{R}_i &= -(y_i - \\hat{y}_i)\\frac{\\partial\\hat{y_i}}{\\partial \\mathbf{W}_T} = -r_i \\frac{\\partial \\hat{y}_i}{\\partial \\mathbf{A}^{(T)}} \\frac{\\partial \\mathbf{A}^{(T)}}{\\partial \\mathbf{W}_T}\\\\\n&= -\\left(r_i B \\odot g'(\\mathbf{W}_T \\mathbf{A}^{(T)}) \\right) \\left(\\mathbf{A}^{(T-1)}\\right)^\\top\\\\\n\\frac{\\partial}{\\partial \\mathbf{W}_{T-1}} \\hat{R}_i &= -(y_i - \\hat{y}_i)\\frac{\\partial\\hat{y_i}}{\\partial \\mathbf{W}_{T-1}} = -r_i \\frac{\\partial \\hat{y}_i}{\\partial \\mathbf{A}^{(T)}} \\frac{\\partial \\mathbf{A}^{(T)}}{\\partial \\mathbf{W}_{T-1}}\\\\\n&= -r_i \\frac{\\partial \\hat{y}_i}{\\partial \\mathbf{A}^{(T)}} \\frac{\\partial \\mathbf{A}^{(T)}}{\\partial \\mathbf{W}_{T}}\\frac{\\partial \\mathbf{W}_{T}}{\\partial \\mathbf{A}^{(T-1)}}\\frac{\\partial \\mathbf{A}^{(T-1)}}{\\partial \\mathbf{W}_{T-1}}\\\\\n\\cdots &= \\cdots\n\\end{aligned}\\]"
+ "section": "How do we measure quality?",
+ "text": "How do we measure quality?\nExample: You’re trying to build a fun widget to classify images of cats and dogs.\n\n\n\nLoss\nPredict Dog\nPredict Cat\n\n\n\n\nActual Dog\n0\n?\n\n\nActual Cat\n?\n0\n\n\n\n\nUse the zero-one loss (1 if wrong, 0 if right). Type of error doesn’t matter.\n\n\n\nLoss\nPredict Dog\nPredict Cat\n\n\n\n\nActual Dog\n0\n1\n\n\nActual Cat\n1\n0"
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#mapping-it-out",
- "href": "schedule/slides/22-nnets-estimation.html#mapping-it-out",
+ "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-2",
+ "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-2",
"title": "UBC Stat406 2024W",
- "section": "Mapping it out",
- "text": "Mapping it out\nGiven current \\(\\mathbf{W}_t, B\\), we want to get new, \\(\\widetilde{\\mathbf{W}}_t,\\ \\widetilde B\\) for \\(t=1,\\ldots,T\\)\n\nSquared error for regression, cross-entropy for classification\n\n\n\nFeed forward \n\\[\\mathbf{A}^{(0)} = \\mathbf{X} \\in \\R^{n\\times p}\\]\nRepeat, \\(t= 1,\\ldots, T\\)\n\n\\(\\mathbf{Z}_{t} = \\mathbf{A}^{(t-1)}\\mathbf{W}_t \\in \\R^{n\\times K_t}\\)\n\\(\\mathbf{A}^{(t)} = g(\\mathbf{Z}_{t})\\) (component wise)\n\\(\\dot{\\mathbf{A}}^{(t)} = g'(\\mathbf{Z}_t)\\)\n\n\\[\\begin{cases}\n\\hat{\\mathbf{y}} =\\mathbf{A}^{(T)} B \\in \\R^n \\\\\n\\hat{\\Pi} = \\left(1 + \\exp\\left(-\\mathbf{A}^{(T)}\\mathbf{B}\\right)\\right)^{-1} \\in \\R^{n \\times M}\\end{cases}\\]\n\n\nBack propogate \n\\[-r = \\begin{cases}\n-\\left(\\mathbf{y} - \\widehat{\\mathbf{y}}\\right) \\\\\n-\\left(1 - \\widehat{\\Pi}\\right)[y]\\end{cases}\\]\n\\[\n\\begin{aligned}\n\\frac{\\partial}{\\partial \\mathbf{B}} \\widehat{R} &= \\left(\\mathbf{A}^{(T)}\\right)^\\top \\mathbf{r}\\\\\n-\\boldsymbol{\\Gamma} &\\leftarrow -\\mathbf{r}\\\\\n\\mathbf{W}_{T+1} &\\leftarrow \\mathbf{B}\n\\end{aligned}\n\\]\nRepeat, \\(t = T,...,1\\),\n\n\\(-\\boldsymbol{\\Gamma} \\leftarrow -\\left(\\boldsymbol{\\Gamma} \\mathbf{W}_{t+1}\\right) \\odot\\dot{\\mathbf{A}}^{(t)}\\)\n\\(\\frac{\\partial R}{\\partial \\mathbf{W}_t} = -\\left(\\mathbf{A}^{(t)}\\right)^\\top \\Gamma\\)"
+ "section": "How do we measure quality?",
+ "text": "How do we measure quality?\nExample: Suppose you have a fever of 39º C. You get a rapid test on campus.\n\n\n\nLoss\nTest +\nTest -\n\n\n\n\nAre +\n0\n? (Infect others)\n\n\nAre -\n? (Isolation)\n0\n\n\n\n\nUse a weighted loss; type of error matters!\n\n\n\nLoss\nTest +\nTest -\n\n\n\n\nAre +\n0\n(LARGE)\n\n\nAre -\n1\n0\n\n\n\nNote that one class is “important”: we sometimes call that one positive. Errors are false positive and false negative.\nIn practice, you have to design your loss (just like before) to reflect what you care about."
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#deep-nets",
- "href": "schedule/slides/22-nnets-estimation.html#deep-nets",
+ "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-3",
+ "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-3",
"title": "UBC Stat406 2024W",
- "section": "Deep nets",
- "text": "Deep nets\nSome comments on adding layers:\n\nIt has been shown that one hidden layer is sufficient to approximate any bounded piecewise continuous function\nHowever, this may take a huge number of hidden units (i.e. \\(K_1 \\gg 1\\)).\nThis is what people mean when they say that NNets are “universal approximators”\nBy including multiple layers, we can have fewer hidden units per layer.\nAlso, we can encode (in)dependencies that can speed computations\nWe don’t have to connect everything the way we have been"
+ "section": "How do we measure quality?",
+ "text": "How do we measure quality?\nWe’re going to use \\(g(x)\\) to be our classifier. It takes values in \\(\\mathcal{K}\\).\nConsider the risk \\[R_n(g) = E [\\ell(Y,g(X))]\\] If we use the law of total probability, this can be written \\[R_n(g) = E\\left[\\sum_{y=1}^K \\ell(y,\\; g(X)) Pr(Y = y \\given X)\\right]\\] We minimize this over a class of options \\(\\mathcal{G}\\), to produce \\[g_*(X) = \\argmin_{g\\in\\mathcal{G}} E\\left[\\sum_{y=1}^K \\ell(y,g(X)) Pr(Y = y \\given X)\\right]\\]"
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#simple-example",
- "href": "schedule/slides/22-nnets-estimation.html#simple-example",
+ "objectID": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-4",
+ "href": "schedule/slides/14-classification-intro.html#how-do-we-measure-quality-4",
"title": "UBC Stat406 2024W",
- "section": "Simple example",
- "text": "Simple example\n\nn <- 200\ndf <- tibble(\n x = seq(.05, 1, length = n),\n y = sin(1 / x) + rnorm(n, 0, .1) # Doppler function\n)\ntestdata <- matrix(seq(.05, 1, length.out = 1e3), ncol = 1)\nlibrary(neuralnet)\nnn_out <- neuralnet(y ~ x, data = df, hidden = c(10, 5, 15), threshold = 0.01, rep = 3)\nnn_preds <- map(1:3, ~ compute(nn_out, testdata, .x)$net.result)\nyhat <- nn_preds |> bind_cols() |> rowMeans() # average over the runs\n\n\n\nCode\n# This code will reproduce the analysis, takes some time\nset.seed(406406406)\nn <- 200\ndf <- tibble(\n x = seq(.05, 1, length = n),\n y = sin(1 / x) + rnorm(n, 0, .1) # Doppler function\n)\ntestx <- matrix(seq(.05, 1, length.out = 1e3), ncol = 1)\nlibrary(neuralnet)\nlibrary(splines)\nfstar <- sin(1 / testx)\nspline_test_err <- function(k) {\n fit <- lm(y ~ bs(x, df = k), data = df)\n yhat <- predict(fit, newdata = tibble(x = testx))\n mean((yhat - fstar)^2)\n}\nKs <- 1:15 * 10\nSplineErr <- map_dbl(Ks, ~ spline_test_err(.x))\n\nJgrid <- c(5, 10, 15)\nNNerr <- double(length(Jgrid)^3)\nNNplot <- character(length(Jgrid)^3)\nsweep <- 0\nfor (J1 in Jgrid) {\n for (J2 in Jgrid) {\n for (J3 in Jgrid) {\n sweep <- sweep + 1\n NNplot[sweep] <- paste(J1, J2, J3, sep = \" \")\n nn_out <- neuralnet(y ~ x, df,\n hidden = c(J1, J2, J3),\n threshold = 0.01, rep = 3\n )\n nn_results <- sapply(1:3, function(x) {\n compute(nn_out, testx, x)$net.result\n })\n # Run them through the neural network\n Yhat <- rowMeans(nn_results)\n NNerr[sweep] <- mean((Yhat - fstar)^2)\n }\n }\n}\n\nbestK <- Ks[which.min(SplineErr)]\nbestspline <- predict(lm(y ~ bs(x, bestK), data = df), newdata = tibble(x = testx))\nbesthidden <- as.numeric(unlist(strsplit(NNplot[which.min(NNerr)], \" \")))\nnn_out <- neuralnet(y ~ x, df, hidden = besthidden, threshold = 0.01, rep = 3)\nnn_results <- sapply(1:3, function(x) compute(nn_out, testdata, x)$net.result)\n# Run them through the neural network\nbestnn <- rowMeans(nn_results)\nplotd <- data.frame(\n x = testdata, spline = bestspline, nnet = bestnn, truth = fstar\n)\nsave.image(file = \"data/nnet-example.Rdata\")"
+ "section": "How do we measure quality?",
+ "text": "How do we measure quality?\n\\(g_*\\) is named the Bayes’ classifier for loss \\(\\ell\\) in class \\(\\mathcal{G}\\).\n\\(R_n(g_*)\\) is the called the Bayes’ limit or Bayes’ Risk.\nIt’s the best we could hope to do even if we knew the distribution of the data (recall irreducible error!)\nBut we don’t, so we’ll try to do our best to estimate \\(g_*\\)."
},
{
- "objectID": "schedule/slides/22-nnets-estimation.html#different-architectures",
- "href": "schedule/slides/22-nnets-estimation.html#different-architectures",
+ "objectID": "schedule/slides/14-classification-intro.html#best-classifier-overall",
+ "href": "schedule/slides/14-classification-intro.html#best-classifier-overall",
"title": "UBC Stat406 2024W",
- "section": "Different architectures",
- "text": "Different architectures"
+ "section": "Best classifier overall",
+ "text": "Best classifier overall\nSuppose we actually know the distribution of everything, and we’ve picked \\(\\ell\\) to be the zero-one loss\n\\[\\ell(y,\\ g(x)) = \\begin{cases}0 & y=g(x)\\\\1 & y\\neq g(x) \\end{cases}\\]\n\n\n\nLoss\nTest +\nTest -\n\n\n\n\nAre +\n0\n1\n\n\nAre -\n1\n0\n\n\n\nThen\n\\[R_n(g) = \\Expect{\\ell(Y,\\ g(X))} = Pr(g(X) \\neq Y)\\]"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#best-classifier-overall-1",
+ "href": "schedule/slides/14-classification-intro.html#best-classifier-overall-1",
+ "title": "UBC Stat406 2024W",
+ "section": "Best classifier overall",
+ "text": "Best classifier overall\nWant to classify a new observation \\((X,Y)\\) such that \\(g(X) = Y\\) with as high probability as possible. Under zero-one loss, we have\n\\[g_* = \\argmin_{g} Pr(g(X) \\neq Y) = \\argmin_g 1- \\Pr(g(X) = Y) = \\argmax_g \\Pr(g(X) = Y)\\]\n\n\\[\n\\begin{aligned}\ng_* &= \\argmax_{g} E[\\Pr(g(X) = Y | X)]\\\\\n&= \\argmax_{g} E\\left[\\sum_{k\\in\\mathcal{K}}1[g(X) = k]\\Pr(Y=k | X)\\right]\n\\end{aligned}\n\\]\n\n\nFor each \\(x\\), only one \\(k\\) can satisfy \\(g(x) = k\\). So for each \\(x\\),\n\\[\ng_*(x) = \\argmax_{k\\in\\mathcal{K}} \\Pr(Y = k | X = x).\n\\]"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#estimating-g_-approach-1-empirical-risk-minimization",
+ "href": "schedule/slides/14-classification-intro.html#estimating-g_-approach-1-empirical-risk-minimization",
+ "title": "UBC Stat406 2024W",
+ "section": "Estimating \\(g_*\\) Approach 1: Empirical risk minimization",
+ "text": "Estimating \\(g_*\\) Approach 1: Empirical risk minimization\n\nChoose some class of classifiers \\(\\mathcal{G}\\).\nFind \\(\\argmin_{g\\in\\mathcal{G}} \\sum_{i = 1}^n I(g(x_i) \\neq y_i)\\)"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#estimating-g_-approach-2-class-densities",
+ "href": "schedule/slides/14-classification-intro.html#estimating-g_-approach-2-class-densities",
+ "title": "UBC Stat406 2024W",
+ "section": "Estimating \\(g_*\\) Approach 2: Class densities",
+ "text": "Estimating \\(g_*\\) Approach 2: Class densities\nConsider 2 classes \\(\\{0,1\\}\\): using Bayes’ theorem (and being loose with notation),\n\\[\\begin{aligned}\n\\Pr(Y=1 \\given X=x) &= \\frac{\\Pr(X=x\\given Y=1) \\Pr(Y=1)}{\\Pr(X=x)}\\\\\n&=\\frac{\\Pr(X=x\\given Y = 1) \\Pr(Y = 1)}{\\sum_{k \\in \\{0,1\\}} \\Pr(X=x\\given Y = k) \\Pr(Y = k)} \\\\\n&= \\frac{p_1(x) \\pi}{ p_1(x)\\pi + p_0(x)(1-\\pi)}\\end{aligned}\\]\n\nWe call \\(p_k(x)\\) the class (conditional) densities\n\\(\\pi\\) is the marginal probability \\(P(Y=1)\\)\nSimilar formula for \\(\\Pr(Y=0\\given X=x) = p_0(x)(1-\\pi)/(\\dots)\\)"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#estimating-g_-approach-2-class-densities-1",
+ "href": "schedule/slides/14-classification-intro.html#estimating-g_-approach-2-class-densities-1",
+ "title": "UBC Stat406 2024W",
+ "section": "Estimating \\(g_*\\) Approach 2: Class densities",
+ "text": "Estimating \\(g_*\\) Approach 2: Class densities\nRecall \\(g_*(x) = \\argmax_k \\Pr(Y=k|x)\\); so we classify 1 if\n\\[\\frac{p_1(x) \\pi}{ p_1(x)\\pi + p_0(x)(1-\\pi)} > \\frac{p_0(x) (1-\\pi)}{ p_1(x)\\pi + p_0(x)(1-\\pi)}\\]\ni.e., the Bayes’ Classifier (best classifier for 0-1 loss) can be rewritten\n\\[g_*(X) = \\begin{cases}\n1 & \\textrm{ if } \\frac{p_1(X)}{p_0(X)} > \\frac{1-\\pi}{\\pi} \\\\\n0 & \\textrm{ otherwise}\n\\end{cases}\\]\nEstimate everything in the expression above.\n\nWe need to estimate \\(p_0\\), \\(p_1\\), \\(\\pi\\), \\(1-\\pi\\)\nEasily extended to more than two classes"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#estimating-g_-approach-3-regression-discretization",
+ "href": "schedule/slides/14-classification-intro.html#estimating-g_-approach-3-regression-discretization",
+ "title": "UBC Stat406 2024W",
+ "section": "Estimating \\(g_*\\) Approach 3: Regression discretization",
+ "text": "Estimating \\(g_*\\) Approach 3: Regression discretization\n0-1 loss natural, but discrete. Let’s try using squared error: \\(\\ell(y,\\ f(x)) = (y - f(x))^2\\)\nWhat will be the optimal classifier here? (hint: think about regression)\n\nThe “Bayes’ Classifier” (sort of…minimizes risk) is just the regression function! \\[f_*(x) = \\Pr(Y = 1 \\given X=x) = E[ Y \\given X = x] \\]\nIn this case, \\(0\\leq f_*(x)\\leq 1\\) not discrete… How do we get a class prediction?\n\n\nDiscretize the output:\n\\[g(x) = \\begin{cases}0 & f_*(x) < 1/2\\\\1 & \\textrm{else}\\end{cases}\\]\n\nEstimate \\(\\hat f(x) = E[Y|X=x] = \\Pr(Y=1|X=x)\\) using any method we’ve learned so far.\nPredict 0 if \\(\\hat{f}(x)\\) is less than 1/2, else predict 1."
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression",
+ "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression",
+ "title": "UBC Stat406 2024W",
+ "section": "Claim: Classification is easier than regression",
+ "text": "Claim: Classification is easier than regression\n\nLet \\(\\hat{f}\\) be any estimate of \\(f_*\\)\nLet \\(\\widehat{g} (x) = \\begin{cases}0 & \\hat f(x) < 1/2\\\\1 & else\\end{cases}\\)\n\nProof by picture."
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-1",
+ "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-1",
+ "title": "UBC Stat406 2024W",
+ "section": "Claim: Classification is easier than regression",
+ "text": "Claim: Classification is easier than regression\n\n\nCode\nset.seed(12345)\nx <- 1:99 / 100\ny <- rbinom(99, 1, \n .25 + .5 * (x > .3 & x < .5) + \n .6 * (x > .7))\ndmat <- as.matrix(dist(x))\nksm <- function(sigma) {\n gg <- dnorm(dmat, sd = sigma) \n sweep(gg, 1, rowSums(gg), '/') %*% y\n}\nfstar <- ksm(.04)\ngg <- tibble(x = x, fstar = fstar, y = y) %>%\n ggplot(aes(x)) +\n geom_point(aes(y = y), color = blue) +\n geom_line(aes(y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1)) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5)\ngg"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-2",
+ "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-2",
+ "title": "UBC Stat406 2024W",
+ "section": "Claim: Classification is easier than regression",
+ "text": "Claim: Classification is easier than regression\n\n\nCode\ngg + geom_hline(yintercept = .5, color = green)"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-3",
+ "href": "schedule/slides/14-classification-intro.html#claim-classification-is-easier-than-regression-3",
+ "title": "UBC Stat406 2024W",
+ "section": "Claim: Classification is easier than regression",
+ "text": "Claim: Classification is easier than regression\n\n\nCode\ntib <- tibble(x = x, fstar = fstar, y = y)\nggplot(tib) +\n geom_vline(data = filter(tib, fstar > 0.5), aes(xintercept = x), alpha = .5, color = green) +\n annotate(\"label\", x = .75, y = .65, label = \"f_star\", size = 5) + \n geom_point(aes(x = x, y = y), color = blue) +\n geom_line(aes(x = x, y = fstar), color = orange, size = 2) +\n coord_cartesian(ylim = c(0,1), xlim = c(0,1))"
+ },
+ {
+ "objectID": "schedule/slides/14-classification-intro.html#how-to-find-a-classifier",
+ "href": "schedule/slides/14-classification-intro.html#how-to-find-a-classifier",
+ "title": "UBC Stat406 2024W",
+ "section": "How to find a classifier",
+ "text": "How to find a classifier\nWhy did we go through that math?\nEach of these approaches has strengths/drawbacks:\n\nEmpirical risk minimization: Minimize \\(R_n(g)\\) in some family \\(\\mathcal{G}\\)\n\n\n(This can be quite challenging as, unlike in regression, the training error is nonconvex)\n\n\nDensity estimation: Estimate \\(\\pi\\) and \\(p_k\\)\n\n\n(We have to estimate class densities to classify. Too roundabout?)\n\n\nRegression: Find an estimate \\(\\hat{f}\\approx E[Y|X=x]\\) and compare the predicted value to 1/2\n\n\n(Unnatural, estimates whole regression function when we’ll just discretize anyway)"
},
{
"objectID": "schedule/slides/16-logistic-regression.html#section",
diff --git a/sitemap.xml b/sitemap.xml
index 6906766..4d24009 100644
--- a/sitemap.xml
+++ b/sitemap.xml
@@ -2,194 +2,194 @@
https://UBC-STAT.github.io/stat-406/schedule/slides/00-r-review.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/handouts/keras-nnet.html
- 2024-10-09T01:08:11.955Z
+ 2024-10-14T21:40:34.410Z
https://UBC-STAT.github.io/stat-406/schedule/slides/11-kernel-smoothers.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/09-l1-penalties.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/18-the-bootstrap.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/23-nnets-other.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/05-estimating-test-mse.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/25-pca-issues.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/26-pca-v-kpca.html
- 2024-10-09T01:08:11.967Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-classification-losses.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/20-boosting.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/27-kmeans.html
- 2024-10-09T01:08:11.967Z
+ 2024-10-14T21:40:34.422Z
- https://UBC-STAT.github.io/stat-406/schedule/slides/14-classification-intro.html
- 2024-10-09T01:08:11.963Z
+ https://UBC-STAT.github.io/stat-406/schedule/slides/22-nnets-estimation.html
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/04-bias-variance.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/06-information-criteria.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/03-regression-function.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/21-nnets-intro.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/faq.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-intro-to-class.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/handouts/lab00-git.html
- 2024-10-09T01:08:11.955Z
+ 2024-10-14T21:40:34.410Z
https://UBC-STAT.github.io/stat-406/course-setup.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/computing/windows.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/computing/mac_x86.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/computing/index.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/index.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/computing/mac_arm.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/computing/ubuntu.html
- 2024-10-09T01:08:11.935Z
+ 2024-10-14T21:40:34.390Z
https://UBC-STAT.github.io/stat-406/syllabus.html
- 2024-10-09T01:08:11.987Z
+ 2024-10-14T21:40:34.446Z
https://UBC-STAT.github.io/stat-406/schedule/index.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-course-review.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-version-control.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/12-why-smooth.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/01-lm-review.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-cv-for-many-models.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/19-bagging-and-rf.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.422Z
- https://UBC-STAT.github.io/stat-406/schedule/slides/22-nnets-estimation.html
- 2024-10-09T01:08:11.963Z
+ https://UBC-STAT.github.io/stat-406/schedule/slides/14-classification-intro.html
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/16-logistic-regression.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/08-ridge-regression.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-quiz-0-wrap.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/15-LDA-and-QDA.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/24-pca-intro.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/13-gams-trees.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/10-basis-expansions.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/28-hclust.html
- 2024-10-09T01:08:11.967Z
+ 2024-10-14T21:40:34.422Z
https://UBC-STAT.github.io/stat-406/schedule/slides/07-greedy-selection.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/02-lm-example.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/17-nonlinear-classifiers.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z
https://UBC-STAT.github.io/stat-406/schedule/slides/00-gradient-descent.html
- 2024-10-09T01:08:11.963Z
+ 2024-10-14T21:40:34.418Z