diff --git a/docs/projA2/projA2.html b/docs/projA2/projA2.html index 2cbf71d..1223fab 100644 --- a/docs/projA2/projA2.html +++ b/docs/projA2/projA2.html @@ -357,8 +357,8 @@

Numerical Overflow

-

This error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df["Value"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25.

-

For your reference, a log sale price of 25 corresponds to a sale price of \(e^{25} \approx\) 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.

+

This error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df["Value"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25, and your minimum value should not be below 0.

+

For your reference, a log sale price of 25 corresponds to a sale price of \(e^{25} \approx\) 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of \(e^{-1} \approx \$0.37\), which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.

diff --git a/docs/search.json b/docs/search.json index b3449cb..aee640e 100644 --- a/docs/search.json +++ b/docs/search.json @@ -414,7 +414,7 @@ "href": "projA2/projA2.html#gradescope", "title": "Project A2 Common Questions", "section": "Gradescope", - "text": "Gradescope\n\nI don’t have many Gradescope submissions left\nIf you’re almost out of Gradescope submissions, try using k-fold cross-validation to check the accuracy of your model. Results from cross-validation will be closer to the test set accuracy than results from the training data. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\n“Wrong number of lines ( __ instead of __ )”\nThis occurs when you remove outliers when preprocessing the testing data. Please do not remove any outliers from your test set. You may only remove outliers in training data.\n\n\nNumerical Overflow\nThis error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df[\"Value\"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25.\nFor your reference, a log sale price of 25 corresponds to a sale price of \\(e^{25} \\approx\\) 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.", + "text": "Gradescope\n\nI don’t have many Gradescope submissions left\nIf you’re almost out of Gradescope submissions, try using k-fold cross-validation to check the accuracy of your model. Results from cross-validation will be closer to the test set accuracy than results from the training data. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\n“Wrong number of lines ( __ instead of __ )”\nThis occurs when you remove outliers when preprocessing the testing data. Please do not remove any outliers from your test set. You may only remove outliers in training data.\n\n\nNumerical Overflow\nThis error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df[\"Value\"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25, and your minimum value should not be below 0.\nFor your reference, a log sale price of 25 corresponds to a sale price of \\(e^{25} \\approx\\) 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of \\(e^{-1} \\approx \\$0.37\\), which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.", "crumbs": [ "8  Project A2 Common Questions" ] diff --git a/index.tex b/index.tex index 7d6ff0e..fbdf27b 100644 --- a/index.tex +++ b/index.tex @@ -191,7 +191,7 @@ \chapter*{About}\label{about} \chapter{Jupyter 101}\label{jupyter-101} -\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, left=2mm, colbacktitle=quarto-callout-note-color!10!white, opacityback=0, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, colframe=quarto-callout-note-color-frame, arc=.35mm, bottomrule=.15mm, rightrule=.15mm, breakable, titlerule=0mm, toprule=.15mm, leftrule=.75mm, colback=white, coltitle=black] +\begin{tcolorbox}[enhanced jigsaw, toptitle=1mm, opacitybacktitle=0.6, opacityback=0, colbacktitle=quarto-callout-note-color!10!white, arc=.35mm, breakable, bottomtitle=1mm, rightrule=.15mm, toprule=.15mm, left=2mm, colframe=quarto-callout-note-color-frame, colback=white, titlerule=0mm, coltitle=black, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, leftrule=.75mm, bottomrule=.15mm] If you're using a MacBook, replace \texttt{ctrl} with \texttt{cmd}. @@ -1633,17 +1633,21 @@ \subsection{``Wrong number of lines ( \_\_ instead of \_\_ \subsection{Numerical Overflow}\label{numerical-overflow} -This error is caused by overly large predictions that create an -extremely large RMSE. The cell before you generate your submission runs +This error can be caused by negative predictions or overly large +predictions that create an extremely large RMSE. The cell before you +generate your submission runs \texttt{submission\_df{[}"Value"{]}.describe()}, which returns some summary statistics of your predictions. Your maximum value for -\texttt{Log\ Sale\ Price} should not be over 25. +\texttt{Log\ Sale\ Price} should not be over 25, and your minimum value +should not be below 0. For your reference, a log sale price of 25 corresponds to a sale price of \(e^{25} \approx\) 70 billion, which is far bigger than anything -found in the dataset. If you see such large predictions, you can try -removing outliers from the \emph{training} data or experimenting with -new features so that your model generalizes better. +found in the dataset. A log sale price of -1 corresponds to a sale price +of \(e^{-1} \approx \$0.37\), which is also not reasonable. If you see +such large or small predictions, you can try removing outliers from the +\emph{training} data or experimenting with new features so that your +model generalizes better. diff --git a/projA2/projA2.md b/projA2/projA2.md index 4dbb072..118b90d 100644 --- a/projA2/projA2.md +++ b/projA2/projA2.md @@ -124,6 +124,6 @@ This occurs when you remove outliers when preprocessing the testing data. *Pleas ### Numerical Overflow -This error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs `submission_df["Value"].describe()`, which returns some summary statistics of your predictions. Your maximum value for `Log Sale Price` should not be over 25. +This error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs `submission_df["Value"].describe()`, which returns some summary statistics of your predictions. Your maximum value for `Log Sale Price` should not be over 25, and your minimum value should not be below 0. -For your reference, a log sale price of 25 corresponds to a sale price of $e^{25} \approx$ 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the *training* data or experimenting with new features so that your model generalizes better. \ No newline at end of file +For your reference, a log sale price of 25 corresponds to a sale price of $e^{25} \approx$ 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of $e^{-1} \approx \$0.37$, which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the *training* data or experimenting with new features so that your model generalizes better. \ No newline at end of file