Built site for gh-pages

UBC-STAT · Sep 16, 2024 · 27b6340 · 27b6340
1 parent 8937e19
commit 27b6340
Show file tree

Hide file tree

Showing 8 changed files with 667 additions and 686 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-828ff237
+392dfcb7
diff --git a/schedule/slides/03-regression-function.html b/schedule/slides/03-regression-function.html
@@ -398,7 +398,7 @@
 <h2>03 The regression function</h2>
 <p><span class="secondary">Stat 406</span></p>
 <p><span class="secondary">Geoff Pleiss, Trevor Campbell</span></p>
-<p>Last modified – 14 September 2024</p>
+<p>Last modified – 16 September 2024</p>
 <p><span class="math display">\[
 \DeclareMathOperator*{\argmin}{argmin}
 \DeclareMathOperator*{\argmax}{argmax}
@@ -700,18 +700,11 @@ <h2>Example: Estimating/Predicting the (conditional) mean</h2>
 <p>Suppose we know that we want to predict a quantity <span class="math inline">\(Y\)</span>,</p>
 <p>where <span class="math inline">\(\Expect{Y}= \mu \in \mathbb{R}\)</span> and <span class="math inline">\(\Var{Y} = 1\)</span>.</p>
 <p>Our data is <span class="math inline">\(\{y_1,\ldots,y_n\}\)</span></p>
-<p>Claim: We want to estimate <span class="math inline">\(\mu\)</span>.</p>
-<div class="fragment">
-<p>Why?</p>
-</div>
+<p>We will use the sample mean <span class="math inline">\(\overline{Y}_n\)</span> to estimate both <span class="math inline">\(\mu\)</span> and <span class="math inline">\(Y\)</span>.</p>
 </section>
 <section id="estimating-the-mean" class="slide level2">
 <h2>Estimating the mean</h2>
-<ul>
-<li>Let <span class="math inline">\(\widehat{Y}=\overline{Y}_n\)</span> be the sample mean.<br>
-</li>
-<li>We can ask about the <em>estimation risk</em> (since we’re estimating <span class="math inline">\(\mu\)</span>):</li>
-</ul>
+<p>We evaluate the <em>estimation risk</em> (since we’re estimating <span class="math inline">\(\mu\)</span>) via:</p>
 <div class="flex">
 <div class="w-50">
 <span class="math display">\[\begin{aligned}
@@ -733,25 +726,24 @@ <h2>Estimating the mean</h2>
 </section>
 <section id="predicting-new-ys" class="slide level2">
 <h2>Predicting new Y’s</h2>
-<ul>
-<li>Let <span class="math inline">\(\widehat{Y}=\overline{Y}_n\)</span> be the sample mean.<br>
-</li>
-<li>What is the <em>prediction risk</em> of <span class="math inline">\(\overline{Y}\)</span>?</li>
-</ul>
+<p>We evaluate the <em>prediction risk</em> of <span class="math inline">\(\overline{Y}_n\)</span> (since we’re predicting <span class="math inline">\(Y\)</span>) via:</p>
 <div class="flex">
 <div class="w-50">
 <span class="math display">\[\begin{aligned}
   R_n(\overline{Y}_n)
   &amp;= \E[(\overline{Y}_n-Y)^2]\\
-  &amp;= \E[\overline{Y}_{n}^{2}] -2\E[\overline{Y}_n Y] + \E[Y^2] \\
-  &amp;= \mu^2 + \frac{1}{n} - 2\mu^2 + \mu^2 + 1 \\
-  &amp;= 1 + \frac{1}{n}
+  &amp;= \E[(\overline{Y}_n - \mu)^2] + \E[(\mu-Y)^2]\\
+  &amp;= \frac{1}{n} + 1
 \end{aligned}\]</span>
+<ul>
+<li><span class="math inline">\(1/n\)</span> for <em>estimation risk</em></li>
+<li><span class="math inline">\(1\)</span> for remaining noise in <span class="math inline">\(Y\)</span></li>
+</ul>
 </div>
 <div class="w-50">
 <p><span class="primary">Tricks:</span></p>
-<p>Used the variance thing again.</p>
-<p>If <span class="math inline">\(X\)</span> and <span class="math inline">\(Z\)</span> are independent, then <span class="math inline">\(\Expect{XZ} = \Expect{X}\Expect{Z}\)</span></p>
+<p>Add and subtract <span class="math inline">\(\mu\)</span> inside the square.</p>
+<p><span class="math inline">\(\overline{Y}_n\)</span> and <span class="math inline">\(Y\)</span> are independent and mean <span class="math inline">\(\mu\)</span>.</p>
 </div>
 </div>
 </section>

diff --git a/schedule/slides/04-bias-variance.html b/schedule/slides/04-bias-variance.html
@@ -398,7 +398,7 @@
 <h2>04 Bias and variance</h2>
 <p><span class="secondary">Stat 406</span></p>
 <p><span class="secondary">Geoff Pleiss, Trevor Campbell</span></p>
-<p>Last modified – 18 September 2023</p>
+<p>Last modified – 16 September 2024</p>
 <p><span class="math display">\[
 \DeclareMathOperator*{\argmin}{argmin}
 \DeclareMathOperator*{\argmax}{argmax}
@@ -418,6 +418,15 @@ <h2>04 Bias and variance</h2>
 \newcommand{\R}{\mathbb{R}}
 \newcommand{\norm}[1]{\left\lVert #1 \right\rVert}
 \newcommand{\snorm}[1]{\lVert #1 \rVert}
+\newcommand{\tr}[1]{\mbox{tr}(#1)}
+\newcommand{\brt}{\widehat{\beta}^R_{s}}
+\newcommand{\brl}{\widehat{\beta}^R_{\lambda}}
+\newcommand{\bls}{\widehat{\beta}_{ols}}
+\newcommand{\blt}{\widehat{\beta}^L_{s}}
+\newcommand{\bll}{\widehat{\beta}^L_{\lambda}}
+\newcommand{\U}{\mathbf{U}}
+\newcommand{\D}{\mathbf{D}}
+\newcommand{\V}{\mathbf{V}}
 \]</span></p>
 </section>
 <section id="section-1" class="slide level2" data-background-color="#e98a15">
@@ -433,13 +442,13 @@ <h3 id="we-just-talked-about">We just talked about</h3>
 <h2>Component 3, the Bias</h2>
 <p>We need to be specific about what we mean when we say <em>bias</em>.</p>
 <p>Bias is neither good nor bad in and of itself.</p>
-<p>A very simple example: let <span class="math inline">\(Z_1,\ \ldots,\ Z_n \sim N(\mu, 1)\)</span>. - We don’t know <span class="math inline">\(\mu\)</span>, so we try to use the data (the <span class="math inline">\(Z_i\)</span>’s) to estimate it.</p>
+<p>A very simple example: let <span class="math inline">\(Y_1,\ \ldots,\ Y_n \sim N(\mu, 1)\)</span>. - We don’t know <span class="math inline">\(\mu\)</span>, so we try to use the data (the <span class="math inline">\(Y_i\)</span>’s) to estimate it.</p>
 <ul>
 <li>I propose 3 estimators:
 <ol type="1">
 <li><p><span class="math inline">\(\widehat{\mu}_1 = 12\)</span>,</p></li>
-<li><p><span class="math inline">\(\widehat{\mu}_2=Z_6\)</span>,</p></li>
-<li><p><span class="math inline">\(\widehat{\mu}_3=\overline{Z}\)</span>.</p></li>
+<li><p><span class="math inline">\(\widehat{\mu}_2=Y_6\)</span>,</p></li>
+<li><p><span class="math inline">\(\widehat{\mu}_3=\overline{Y}\)</span>.</p></li>
 </ol></li>
 <li>The <span class="secondary">bias</span> (by definition) of my estimator is <span class="math inline">\(E[\widehat{\mu_i}]-\mu\)</span>.</li>
 </ul>
@@ -481,13 +490,17 @@ <h2>One can show… (wait for the proof)</h2>
 \frac{a^2}{n} +1
 \]</span></p>
 <div class="fragment">
-<p>We can minimize this in <span class="math inline">\(a\)</span> to get the best possible prediction risk for an estimator of the form <span class="math inline">\(\widehat Y_a\)</span>:</p>
+<p>We can minimize this to get the best possible prediction risk for an estimator of the form <span class="math inline">\(\widehat Y_a\)</span>:</p>
 <p><span class="math display">\[
-\argmin_{a} R_n(\widehat Y_a) = \left(\frac{\mu^2}{\mu^2 + 1/n} \right)
+\argmin_{a} R_n(\widehat Y_a) = \left(\frac{\mu^2}{\mu^2 + 1/n} \right)\qquad
+\min_{a} R_n(\widehat Y_a) = 1+\left(\frac{\mu^2}{n\mu^2 + 1} \right).
 \]</span></p>
 </div>
 <div class="fragment">
-<p>What happens if <span class="math inline">\(\mu \ll 1\)</span>?</p>
+<p>Is this less than or greater than the risk we saw for <span class="math inline">\(\bar Y\)</span>?</p>
+</div>
+<div class="fragment">
+<p>Am I cheating here?</p>
 </div>
 </section>
 <section id="section-2" class="slide level2">
@@ -531,18 +544,6 @@ <h2>To restate</h2>
 </ol>
 </div>
 </section>
-<section id="prediction-risk" class="slide level2">
-<h2>Prediction risk</h2>
-<p>(Now using generic prediction function <span class="math inline">\(f\)</span>)</p>
-<p><span class="math display">\[
-R_n(f) = \Expect{(Y - f(X))^2}
-\]</span></p>
-<p>Why should we care about <span class="math inline">\(R_n(f)\)</span>?</p>
-<p>👍 Measures predictive accuracy on average.</p>
-<p>👍 How much confidence should you have in <span class="math inline">\(f\)</span>’s predictions.</p>
-<p>👍 Compare with other predictors: <span class="math inline">\(R_n(f)\)</span> vs <span class="math inline">\(R_n(g)\)</span></p>
-<p>🤮 <em>This is hard:</em> Don’t know the distribution of the data (if I knew the truth, this would be easy)</p>
-</section>
 <section id="bias-variance-decomposition" class="slide level2">
 <h2>Bias-variance decomposition</h2>
 <p><span class="math display">\[R_n(\widehat{Y}_a)=(a - 1)^2\mu^2 + \frac{a^2}{n} + 1\]</span></p>
@@ -592,7 +593,7 @@ <h2>Bias-variance decomposition</h2>
 </div>
 <div class="callout-content">
 <div class="bigger">
-<p>Implication: prediction risk is proportional to estimation risk. However, defining estimation risk requires stronger assumptions.</p>
+<p>Implication: prediction risk is estimation risk plus something you can’t control. However, defining estimation risk requires stronger assumptions (not always just estimating a parameter).</p>
 </div>
 </div>
 </div>