update proja2 numerical overflow

DS-100 · Nov 1, 2024 · 367cc8d · 367cc8d
1 parent 6998af7
commit 367cc8d
Show file tree

Hide file tree

Showing 4 changed files with 16 additions and 12 deletions.
diff --git a/docs/projA2/projA2.html b/docs/projA2/projA2.html
@@ -357,8 +357,8 @@ <h3 class="anchored" data-anchor-id="wrong-number-of-lines-__-instead-of-__">“
 </section>
 <section id="numerical-overflow" class="level3">
 <h3 class="anchored" data-anchor-id="numerical-overflow">Numerical Overflow</h3>
-<p>This error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs <code>submission_df["Value"].describe()</code>, which returns some summary statistics of your predictions. Your maximum value for <code>Log Sale Price</code> should not be over 25.</p>
-<p>For your reference, a log sale price of 25 corresponds to a sale price of <span class="math inline">\(e^{25} \approx\)</span> 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the <em>training</em> data or experimenting with new features so that your model generalizes better.</p>
+<p>This error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs <code>submission_df["Value"].describe()</code>, which returns some summary statistics of your predictions. Your maximum value for <code>Log Sale Price</code> should not be over 25, and your minimum value should not be below 0.</p>
+<p>For your reference, a log sale price of 25 corresponds to a sale price of <span class="math inline">\(e^{25} \approx\)</span> 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of <span class="math inline">\(e^{-1} \approx \$0.37\)</span>, which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the <em>training</em> data or experimenting with new features so that your model generalizes better.</p>
 
 
 </section>

diff --git a/docs/search.json b/docs/search.json
@@ -414,7 +414,7 @@
     "href": "projA2/projA2.html#gradescope",
     "title": "Project A2 Common Questions",
     "section": "Gradescope",
-    "text": "Gradescope\n\nI don’t have many Gradescope submissions left\nIf you’re almost out of Gradescope submissions, try using k-fold cross-validation to check the accuracy of your model. Results from cross-validation will be closer to the test set accuracy than results from the training data. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\n“Wrong number of lines ( __ instead of __ )”\nThis occurs when you remove outliers when preprocessing the testing data. Please do not remove any outliers from your test set. You may only remove outliers in training data.\n\n\nNumerical Overflow\nThis error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df[\"Value\"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25.\nFor your reference, a log sale price of 25 corresponds to a sale price of \\(e^{25} \\approx\\) 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.",
+    "text": "Gradescope\n\nI don’t have many Gradescope submissions left\nIf you’re almost out of Gradescope submissions, try using k-fold cross-validation to check the accuracy of your model. Results from cross-validation will be closer to the test set accuracy than results from the training data. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\n“Wrong number of lines ( __ instead of __ )”\nThis occurs when you remove outliers when preprocessing the testing data. Please do not remove any outliers from your test set. You may only remove outliers in training data.\n\n\nNumerical Overflow\nThis error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df[\"Value\"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25, and your minimum value should not be below 0.\nFor your reference, a log sale price of 25 corresponds to a sale price of \\(e^{25} \\approx\\) 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of \\(e^{-1} \\approx \\$0.37\\), which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.",
     "crumbs": [
       "<span class='chapter-number'>8</span>  <span class='chapter-title'>Project A2 Common Questions</span>"
     ]

diff --git a/index.tex b/index.tex
@@ -191,7 +191,7 @@ \chapter*{About}\label{about}
 
 \chapter{Jupyter 101}\label{jupyter-101}
 
-\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, left=2mm, colbacktitle=quarto-callout-note-color!10!white, opacityback=0, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, colframe=quarto-callout-note-color-frame, arc=.35mm, bottomrule=.15mm, rightrule=.15mm, breakable, titlerule=0mm, toprule=.15mm, leftrule=.75mm, colback=white, coltitle=black]
+\begin{tcolorbox}[enhanced jigsaw, toptitle=1mm, opacitybacktitle=0.6, opacityback=0, colbacktitle=quarto-callout-note-color!10!white, arc=.35mm, breakable, bottomtitle=1mm, rightrule=.15mm, toprule=.15mm, left=2mm, colframe=quarto-callout-note-color-frame, colback=white, titlerule=0mm, coltitle=black, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, leftrule=.75mm, bottomrule=.15mm]
 
 If you're using a MacBook, replace \texttt{ctrl} with \texttt{cmd}.
 
@@ -1633,17 +1633,21 @@ \subsection{``Wrong number of lines ( \_\_ instead of \_\_
 
 \subsection{Numerical Overflow}\label{numerical-overflow}
 
-This error is caused by overly large predictions that create an
-extremely large RMSE. The cell before you generate your submission runs
+This error can be caused by negative predictions or overly large
+predictions that create an extremely large RMSE. The cell before you
+generate your submission runs
 \texttt{submission\_df{[}"Value"{]}.describe()}, which returns some
 summary statistics of your predictions. Your maximum value for
-\texttt{Log\ Sale\ Price} should not be over 25.
+\texttt{Log\ Sale\ Price} should not be over 25, and your minimum value
+should not be below 0.
 
 For your reference, a log sale price of 25 corresponds to a sale price
 of \(e^{25} \approx\) 70 billion, which is far bigger than anything
-found in the dataset. If you see such large predictions, you can try
-removing outliers from the \emph{training} data or experimenting with
-new features so that your model generalizes better.
+found in the dataset. A log sale price of -1 corresponds to a sale price
+of \(e^{-1} \approx \$0.37\), which is also not reasonable. If you see
+such large or small predictions, you can try removing outliers from the
+\emph{training} data or experimenting with new features so that your
+model generalizes better.
 
 
 

diff --git a/projA2/projA2.md b/projA2/projA2.md
@@ -124,6 +124,6 @@ This occurs when you remove outliers when preprocessing the testing data. *Pleas
 
 ### Numerical Overflow 
 
-This error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs `submission_df["Value"].describe()`, which returns some summary statistics of your predictions. Your maximum value for `Log Sale Price` should not be over 25.
+This error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs `submission_df["Value"].describe()`, which returns some summary statistics of your predictions. Your maximum value for `Log Sale Price` should not be over 25, and your minimum value should not be below 0.
 
-For your reference, a log sale price of 25 corresponds to a sale price of $e^{25} \approx$ 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the *training* data or experimenting with new features so that your model generalizes better. 
+For your reference, a log sale price of 25 corresponds to a sale price of $e^{25} \approx$ 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of $e^{-1} \approx \$0.37$, which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the *training* data or experimenting with new features so that your model generalizes better.