Built site for gh-pages

UBC-STAT · Oct 8, 2024 · 8fa831a · 8fa831a
1 parent 7625139
commit 8fa831a
Show file tree

Hide file tree

Showing 5 changed files with 69 additions and 69 deletions.
diff --git a/.nojekyll b/.nojekyll
@@ -1 +1 @@
-51465757
+eafe6f9e
diff --git a/schedule/slides/11-kernel-smoothers.html b/schedule/slides/11-kernel-smoothers.html
@@ -408,7 +408,7 @@ <h2>1) The Bias of OLS</h2>
 <p>A. Assume that <span class="math inline">\(y = x^\top \beta + \epsilon, \quad \epsilon \sim \mathcal N(0, \sigma^2).\)</span></p>
 <p>B. Then <span class="math inline">\(E[\hat \beta_\mathrm{ols}] = E[ E[\hat \beta_\mathrm{ols} \mid \mathbf X] ]\)</span></p>
 <p>C. <span class="math inline">\(E[\hat \beta_\mathrm{ols} \mid \mathbf X] = (\mathbf X^\top \mathbf X)^{-1} \mathbf X^\top E[ \mathbf y \mid \mathbf X]\)</span></p>
-<p>D. <span class="math inline">\(E[ \mathbf y \mid \mathbf X] = \mathbf X^\top \beta\)</span></p>
+<p>D. <span class="math inline">\(E[ \mathbf y \mid \mathbf X] = \mathbf X \beta\)</span></p>
 <p>E. So <span class="math inline">\(E[\hat \beta_\mathrm{ols}] - \beta = E[(\mathbf X^\top \mathbf X)^{-1} \mathbf X^\top \mathbf X \beta] - \beta = \beta - \beta = 0\)</span>.</p>
 <p><br>
 <span class="secondary">Why did this proof not apply to the clicker question?</span><br>
@@ -431,7 +431,7 @@ <h2>1) The Bias of OLS</h2>
 \end{align}
 \]</span></p>
 <p>In statistics speak, our model is <em>misspecified</em>.<br>
-<span class="small">Ridge/lasso will still increase bias and decrease variance even under misspecification.</span></p>
+<span class="small">Ridge/lasso will always increase bias and decrease variance, even under misspecification.</span></p>
 </div>
 </section>
 <section id="why-does-ridge-regression-shrink-varinace" class="slide level2">
@@ -452,7 +452,7 @@ <h3 id="intuitively">Intuitively…</h3>
 <h2>11 Local methods</h2>
 <p><span class="secondary">Stat 406</span></p>
 <p><span class="secondary">Geoff Pleiss, Trevor Campbell</span></p>
-<p>Last modified – 07 October 2024</p>
+<p>Last modified – 08 October 2024</p>
 <p><span class="math display">\[
 \DeclareMathOperator*{\argmin}{argmin}
 \DeclareMathOperator*{\argmax}{argmax}

diff --git a/schedule/slides/12-why-smooth.html b/schedule/slides/12-why-smooth.html
@@ -400,7 +400,7 @@ <h3 id="variance">Variance</h3>
 <h3 id="bias">Bias</h3>
 <ul>
 <li>Basis: bias is <em>fixed</em><br>
-<span class="small">Assuming <span class="math inline">\(k\)</span> is fixed</span></li>
+<span class="small">Assuming num. basis features is fixed</span></li>
 <li>Local: bias depends on choice of bandwidth <span class="math inline">\(\sigma\)</span>.</li>
 </ul>
 </div>
@@ -444,8 +444,8 @@ <h3 id="what-do-you-notice">What do you notice?</h3>
 <div class="fragment">
 <ul>
 <li>As <span class="math inline">\(n\)</span> increases, the optimal bandwidth <span class="math inline">\(\sigma\)</span> decreases</li>
-<li>As <span class="math inline">\(n \to \infty\)</span>, <span class="math inline">\(R_n^{(\mathrm{basis})} \to C_1^{(\mathrm{basis})} + \sigma^2\)</span></li>
-<li>As <span class="math inline">\(n \to \infty\)</span>, <span class="math inline">\(R_n^{(\mathrm{local})} \to \sigma^2\)</span></li>
+<li><span class="math inline">\(R_n^{(\mathrm{basis})} \overset{n \to \infty}{\longrightarrow} C_1^{(\mathrm{basis})} + \sigma^2\)</span></li>
+<li><span class="math inline">\(R_n^{(\mathrm{local})} \overset{n \to \infty}{\longrightarrow} \sigma^2\)</span></li>
 </ul>
 </div>
 </div>
@@ -459,7 +459,7 @@ <h3 id="what-do-you-notice">What do you notice?</h3>
 <section id="takeaway" class="slide level2">
 <h2>Takeaway</h2>
 <ol type="1">
-<li>Local methods are <em>consistent</em> (bias and variance go to 0 as <span class="math inline">\(n \to \infty\)</span>)</li>
+<li>Local methods are <em>consistent universal approximators</em> (bias and variance go to 0 as <span class="math inline">\(n \to \infty\)</span>)</li>
 <li>Fixed basis expansions are <em>biased</em> but have lower variance when <span class="math inline">\(n\)</span> is relatively small.<br>
 <span class="small"><span class="math inline">\(\underbrace{O(1/n)}_{\text{basis var.}} &lt; \underbrace{O(1/n^{4/5})}_{\text{local var.}}\)</span></span></li>
 </ol>
@@ -484,7 +484,7 @@ <h2>Mathematically</h2>
 <div class="flex">
 <div class="w-70">
 <p>Consider <span class="math inline">\(x_1, x_2, \ldots, x_n\)</span> distributed <em>uniformly</em> within a <span class="math inline">\(p\)</span>-dimensional ball of radius 1. For a test point <span class="math inline">\(x\)</span> at the center of the ball, how far away are its <span class="math inline">\(k = n/10\)</span> nearest neighbours?</p>
-<p><span class="small">(The picture on the right makes sense in 2D. It gives the wrong intuitions for higher dimensions!)</span></p>
+<p><span class="small">(The picture on the right makes sense in 2D. However, it gives the wrong intuition for higher dimensions!)</span></p>
 </div>
 <div class="w-30">
 <div class="cell" data-layout-align="center">
@@ -531,8 +531,8 @@ <h3 id="risk-decomposition-p-1">Risk decomposition (<span class="math inline">\(
 <p><span class="small">Assuming optimal bandwidth of <span class="math inline">\(n^{-1/(4+p)}\)</span>…</span></p>
 <p><span class="math display">\[
 R_n^{(\mathrm{OLS})} =
-  \underbrace{C_1^{(\mathrm{lin})}}_{\mathrm{bias}^2} +
-  \underbrace{\tfrac{C_2^{(\mathrm{lin})}}{n/p}}_{\mathrm{var}} +
+  \underbrace{C_1^{(\mathrm{OLS})}}_{\mathrm{bias}^2} +
+  \underbrace{\tfrac{C_2^{(\mathrm{OLS})}}{n/p}}_{\mathrm{var}} +
   \sigma^2,
 \qquad
 R_n^{(\mathrm{local})} =
@@ -544,8 +544,8 @@ <h3 id="risk-decomposition-p-1">Risk decomposition (<span class="math inline">\(
 <!-- -->
 <h3 id="observations">Observations</h3>
 <ul>
-<li><span class="math inline">\((C_1^{(\mathrm{local})} + C_2^{(\mathrm{local})}) / n^{4/(4+p)}\)</span> is relatively big, but <span class="math inline">\(C_2^{(\mathrm{lin})} / (n/p)\)</span> is relatively small.</li>
-<li>So unless <span class="math inline">\(C_1^{(\mathrm{lin})}\)</span> is big, we should use the linear model.*<br>
+<li><span class="math inline">\((C_1^{(\mathrm{local})} + C_2^{(\mathrm{local})}) / n^{4/(4+p)}\)</span> is relatively big, but <span class="math inline">\(C_2^{(\mathrm{OLS})} / (n/p)\)</span> is relatively small.</li>
+<li>So unless <span class="math inline">\(C_1^{(\mathrm{OLS})}\)</span> is big, we should use the linear model.*<br>
 </li>
 </ul>
 </div>