Skip to content

Commit

Permalink
Built site for gh-pages
Browse files Browse the repository at this point in the history
  • Loading branch information
Quarto GHA Workflow Runner committed Oct 8, 2024
1 parent 7625139 commit 8fa831a
Show file tree
Hide file tree
Showing 5 changed files with 69 additions and 69 deletions.
2 changes: 1 addition & 1 deletion .nojekyll
Original file line number Diff line number Diff line change
@@ -1 +1 @@
51465757
eafe6f9e
6 changes: 3 additions & 3 deletions schedule/slides/11-kernel-smoothers.html
Original file line number Diff line number Diff line change
Expand Up @@ -408,7 +408,7 @@ <h2>1) The Bias of OLS</h2>
<p>A. Assume that <span class="math inline">\(y = x^\top \beta + \epsilon, \quad \epsilon \sim \mathcal N(0, \sigma^2).\)</span></p>
<p>B. Then <span class="math inline">\(E[\hat \beta_\mathrm{ols}] = E[ E[\hat \beta_\mathrm{ols} \mid \mathbf X] ]\)</span></p>
<p>C. <span class="math inline">\(E[\hat \beta_\mathrm{ols} \mid \mathbf X] = (\mathbf X^\top \mathbf X)^{-1} \mathbf X^\top E[ \mathbf y \mid \mathbf X]\)</span></p>
<p>D. <span class="math inline">\(E[ \mathbf y \mid \mathbf X] = \mathbf X^\top \beta\)</span></p>
<p>D. <span class="math inline">\(E[ \mathbf y \mid \mathbf X] = \mathbf X \beta\)</span></p>
<p>E. So <span class="math inline">\(E[\hat \beta_\mathrm{ols}] - \beta = E[(\mathbf X^\top \mathbf X)^{-1} \mathbf X^\top \mathbf X \beta] - \beta = \beta - \beta = 0\)</span>.</p>
<p><br>
<span class="secondary">Why did this proof not apply to the clicker question?</span><br>
Expand All @@ -431,7 +431,7 @@ <h2>1) The Bias of OLS</h2>
\end{align}
\]</span></p>
<p>In statistics speak, our model is <em>misspecified</em>.<br>
<span class="small">Ridge/lasso will still increase bias and decrease variance even under misspecification.</span></p>
<span class="small">Ridge/lasso will always increase bias and decrease variance, even under misspecification.</span></p>
</div>
</section>
<section id="why-does-ridge-regression-shrink-varinace" class="slide level2">
Expand All @@ -452,7 +452,7 @@ <h3 id="intuitively">Intuitively…</h3>
<h2>11 Local methods</h2>
<p><span class="secondary">Stat 406</span></p>
<p><span class="secondary">Geoff Pleiss, Trevor Campbell</span></p>
<p>Last modified – 07 October 2024</p>
<p>Last modified – 08 October 2024</p>
<p><span class="math display">\[
\DeclareMathOperator*{\argmin}{argmin}
\DeclareMathOperator*{\argmax}{argmax}
Expand Down
18 changes: 9 additions & 9 deletions schedule/slides/12-why-smooth.html
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ <h3 id="variance">Variance</h3>
<h3 id="bias">Bias</h3>
<ul>
<li>Basis: bias is <em>fixed</em><br>
<span class="small">Assuming <span class="math inline">\(k\)</span> is fixed</span></li>
<span class="small">Assuming num. basis features is fixed</span></li>
<li>Local: bias depends on choice of bandwidth <span class="math inline">\(\sigma\)</span>.</li>
</ul>
</div>
Expand Down Expand Up @@ -444,8 +444,8 @@ <h3 id="what-do-you-notice">What do you notice?</h3>
<div class="fragment">
<ul>
<li>As <span class="math inline">\(n\)</span> increases, the optimal bandwidth <span class="math inline">\(\sigma\)</span> decreases</li>
<li>As <span class="math inline">\(n \to \infty\)</span>, <span class="math inline">\(R_n^{(\mathrm{basis})} \to C_1^{(\mathrm{basis})} + \sigma^2\)</span></li>
<li>As <span class="math inline">\(n \to \infty\)</span>, <span class="math inline">\(R_n^{(\mathrm{local})} \to \sigma^2\)</span></li>
<li><span class="math inline">\(R_n^{(\mathrm{basis})} \overset{n \to \infty}{\longrightarrow} C_1^{(\mathrm{basis})} + \sigma^2\)</span></li>
<li><span class="math inline">\(R_n^{(\mathrm{local})} \overset{n \to \infty}{\longrightarrow} \sigma^2\)</span></li>
</ul>
</div>
</div>
Expand All @@ -459,7 +459,7 @@ <h3 id="what-do-you-notice">What do you notice?</h3>
<section id="takeaway" class="slide level2">
<h2>Takeaway</h2>
<ol type="1">
<li>Local methods are <em>consistent</em> (bias and variance go to 0 as <span class="math inline">\(n \to \infty\)</span>)</li>
<li>Local methods are <em>consistent universal approximators</em> (bias and variance go to 0 as <span class="math inline">\(n \to \infty\)</span>)</li>
<li>Fixed basis expansions are <em>biased</em> but have lower variance when <span class="math inline">\(n\)</span> is relatively small.<br>
<span class="small"><span class="math inline">\(\underbrace{O(1/n)}_{\text{basis var.}} &lt; \underbrace{O(1/n^{4/5})}_{\text{local var.}}\)</span></span></li>
</ol>
Expand All @@ -484,7 +484,7 @@ <h2>Mathematically</h2>
<div class="flex">
<div class="w-70">
<p>Consider <span class="math inline">\(x_1, x_2, \ldots, x_n\)</span> distributed <em>uniformly</em> within a <span class="math inline">\(p\)</span>-dimensional ball of radius 1. For a test point <span class="math inline">\(x\)</span> at the center of the ball, how far away are its <span class="math inline">\(k = n/10\)</span> nearest neighbours?</p>
<p><span class="small">(The picture on the right makes sense in 2D. It gives the wrong intuitions for higher dimensions!)</span></p>
<p><span class="small">(The picture on the right makes sense in 2D. However, it gives the wrong intuition for higher dimensions!)</span></p>
</div>
<div class="w-30">
<div class="cell" data-layout-align="center">
Expand Down Expand Up @@ -531,8 +531,8 @@ <h3 id="risk-decomposition-p-1">Risk decomposition (<span class="math inline">\(
<p><span class="small">Assuming optimal bandwidth of <span class="math inline">\(n^{-1/(4+p)}\)</span></span></p>
<p><span class="math display">\[
R_n^{(\mathrm{OLS})} =
\underbrace{C_1^{(\mathrm{lin})}}_{\mathrm{bias}^2} +
\underbrace{\tfrac{C_2^{(\mathrm{lin})}}{n/p}}_{\mathrm{var}} +
\underbrace{C_1^{(\mathrm{OLS})}}_{\mathrm{bias}^2} +
\underbrace{\tfrac{C_2^{(\mathrm{OLS})}}{n/p}}_{\mathrm{var}} +
\sigma^2,
\qquad
R_n^{(\mathrm{local})} =
Expand All @@ -544,8 +544,8 @@ <h3 id="risk-decomposition-p-1">Risk decomposition (<span class="math inline">\(
<!-- -->
<h3 id="observations">Observations</h3>
<ul>
<li><span class="math inline">\((C_1^{(\mathrm{local})} + C_2^{(\mathrm{local})}) / n^{4/(4+p)}\)</span> is relatively big, but <span class="math inline">\(C_2^{(\mathrm{lin})} / (n/p)\)</span> is relatively small.</li>
<li>So unless <span class="math inline">\(C_1^{(\mathrm{lin})}\)</span> is big, we should use the linear model.*<br>
<li><span class="math inline">\((C_1^{(\mathrm{local})} + C_2^{(\mathrm{local})}) / n^{4/(4+p)}\)</span> is relatively big, but <span class="math inline">\(C_2^{(\mathrm{OLS})} / (n/p)\)</span> is relatively small.</li>
<li>So unless <span class="math inline">\(C_1^{(\mathrm{OLS})}\)</span> is big, we should use the linear model.*<br>
</li>
</ul>
</div>
Expand Down
Loading

0 comments on commit 8fa831a

Please sign in to comment.