Pushed A2

DS-100 · Mar 15, 2024 · 082c693 · 082c693
1 parent 610615a
commit 082c693
Show file tree

Hide file tree

Showing 3 changed files with 14 additions and 13 deletions.
diff --git a/docs/projA2/projA2.html b/docs/projA2/projA2.html
@@ -244,14 +244,14 @@ <h2 class="anchored" data-anchor-id="question-5d-and-5f">Question 5d and 5f</h2>
 <h3 class="anchored" data-anchor-id="general-debugging-tips">General Debugging Tips</h3>
 <p>Question 5 is a challenging question that mirrors a lot of data science work in the real world: cleaning, exploring, and transforming data; fitting a model, working with a pre-defined pipeline and evaluating your model’s performance. Here are some general debugging tips to make the process easier:</p>
 <ul>
-<li>Separate small tasks into helper functions, especially if you will execute them multiple times. For example, one-hot-encoding a categorical variable is a good helper function to make because you could perform it on multiple such columns. If you’re parsing a column with RegEx, it also might be a good idea to separate it to a helper function. This allows you to verify that you’re not making errors in these small tasks and prevents unknown bugs from appearing.</li>
+<li>Separate small tasks into helper functions, especially if you will execute them multiple times. For example, a helper function that one-hot encodes a categorical variable may be helpful as you could perform it on multiple such columns. If you’re parsing a column with RegEx, it also might be a good idea to separate it to a helper function. This allows you to verify that you’re not making errors in these small tasks and prevents unknown bugs from appearing.</li>
 <li>Feel free to make new cells to play with the data! As long as you delete them afterward, it will not affect the autograder.</li>
-<li>The <code>feature_engine_final</code> looks daunting at first, but start small. First, try and implement a model with a single feature to get familiar with how the function works, then slowly experiment with adding one feature at a time and see how that affects your training RMSE.</li>
+<li>The <code>feature_engine_final</code> looks daunting at first, but start small. First, try and implement a model with a single feature to get familiar with how the pipeline works, then slowly experiment with adding one feature at a time and see how that affects your training RMSE.</li>
 </ul>
 </section>
 <section id="my-training-rmse-is-low-but-my-validationtest-rmse-is-high" class="level3">
 <h3 class="anchored" data-anchor-id="my-training-rmse-is-low-but-my-validationtest-rmse-is-high">My training RMSE is low, but my validation/test RMSE is high</h3>
-<p>Your model is likely overfitting to the training data and does not generalize to the test set. Recall the bias-variance tradeoff discussed in lecture. As you add more features and make your model more complex, it is expected that your training error will decrease. Your validation and test error may also decrease initially, but if your model is too complex, you run into this issue.</p>
+<p>Your model is likely overfitting to the training data and does not generalize to the test set. Recall the bias-variance tradeoff discussed in lecture. As you add more features and make your model more complex, it is expected that your training error will decrease. Your validation and test error may also decrease initially, but if your model is too complex, you end up with high validation and test RMSE.</p>
 <center>
 <img src="under_overfit.png" width="500">
 </center>

diff --git a/index.tex b/index.tex
@@ -196,7 +196,7 @@ \chapter*{About}\label{about}
 
 \chapter{Jupyter 101}\label{jupyter-101}
 
-\begin{tcolorbox}[enhanced jigsaw, colframe=quarto-callout-note-color-frame, leftrule=.75mm, arc=.35mm, bottomrule=.15mm, rightrule=.15mm, left=2mm, coltitle=black, titlerule=0mm, colback=white, breakable, colbacktitle=quarto-callout-note-color!10!white, bottomtitle=1mm, opacitybacktitle=0.6, toptitle=1mm, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, toprule=.15mm, opacityback=0]
+\begin{tcolorbox}[enhanced jigsaw, colframe=quarto-callout-note-color-frame, colback=white, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, leftrule=.75mm, bottomtitle=1mm, coltitle=black, breakable, bottomrule=.15mm, arc=.35mm, colbacktitle=quarto-callout-note-color!10!white, opacityback=0, toptitle=1mm, titlerule=0mm, toprule=.15mm, left=2mm, rightrule=.15mm, opacitybacktitle=0.6]
 
 If you're using a MacBook, replace \texttt{ctrl} with \texttt{cmd}.
 
@@ -1351,10 +1351,10 @@ \subsection{General Debugging Tips}\label{general-debugging-tips}
 \tightlist
 \item
   Separate small tasks into helper functions, especially if you will
-  execute them multiple times. For example, one-hot-encoding a
-  categorical variable is a good helper function to make because you
-  could perform it on multiple such columns. If you're parsing a column
-  with RegEx, it also might be a good idea to separate it to a helper
+  execute them multiple times. For example, a helper function that
+  one-hot encodes a categorical variable may be helpful as you could
+  perform it on multiple such columns. If you're parsing a column with
+  RegEx, it also might be a good idea to separate it to a helper
   function. This allows you to verify that you're not making errors in
   these small tasks and prevents unknown bugs from appearing.
 \item
@@ -1363,7 +1363,7 @@ \subsection{General Debugging Tips}\label{general-debugging-tips}
 \item
   The \texttt{feature\_engine\_final} looks daunting at first, but start
   small. First, try and implement a model with a single feature to get
-  familiar with how the function works, then slowly experiment with
+  familiar with how the pipeline works, then slowly experiment with
   adding one feature at a time and see how that affects your training
   RMSE.
 \end{itemize}
@@ -1376,7 +1376,7 @@ \subsection{My training RMSE is low, but my validation/test RMSE is
 in lecture. As you add more features and make your model more complex,
 it is expected that your training error will decrease. Your validation
 and test error may also decrease initially, but if your model is too
-complex, you run into this issue.
+complex, you end up with high validation and test RMSE.
 
 Consider visualizing the relationship between the features you've chosen
 and the (Log) Sale Price and removing the features that are not highly

diff --git a/projA2/projA2.md b/projA2/projA2.md
@@ -18,12 +18,13 @@ jupyter: python3
 ### General Debugging Tips
 Question 5 is a challenging question that mirrors a lot of data science work in the real world: cleaning, exploring, and transforming data; fitting a model, working with a pre-defined pipeline and evaluating your model's performance. Here are some general debugging tips to make the process easier: 
 
-* Separate small tasks into helper functions, especially if you will execute them multiple times. For example, one-hot-encoding a categorical variable is a good helper function to make because you could perform it on multiple such columns. If you're parsing a column with RegEx, it also might be a good idea to separate it to a helper function. This allows you to verify that you're not making errors in these small tasks and prevents unknown bugs from appearing. 
+* Separate small tasks into helper functions, especially if you will execute them multiple times. For example, a helper function that one-hot encodes a categorical variable may be helpful as you could perform it on multiple such columns. If you're parsing a column with RegEx, it also might be a good idea to separate it to a helper function. This allows you to verify that you're not making errors in these small tasks and prevents unknown bugs from appearing. 
 * Feel free to make new cells to play with the data! As long as you delete them afterward, it will not affect the autograder. 
-* The `feature_engine_final` looks daunting at first, but start small. First, try and implement a model with a single feature to get familiar with how the function works, then slowly experiment with adding one feature at a time and see how that affects your training RMSE. 
+* The `feature_engine_final` looks daunting at first, but start small. First, try and implement a model with a single feature to get familiar with how the pipeline works, then slowly experiment with adding one feature at a time and see how that affects your training RMSE. 
 
 ### My training RMSE is low, but my validation/test RMSE is high
-Your model is likely overfitting to the training data and does not generalize to the test set. Recall the bias-variance tradeoff discussed in lecture. As you add more features and make your model more complex, it is expected that your training error will decrease. Your validation and test error may also decrease initially, but if your model is too complex, you run into this issue.
+
+Your model is likely overfitting to the training data and does not generalize to the test set. Recall the bias-variance tradeoff discussed in lecture. As you add more features and make your model more complex, it is expected that your training error will decrease. Your validation and test error may also decrease initially, but if your model is too complex, you end up with high validation and test RMSE.
 
 <center><img src = "under_overfit.png" width = "500"></img></a></center>