Skip to content

Commit

Permalink
update proja2 numerical overflow
Browse files Browse the repository at this point in the history
  • Loading branch information
nsreddy16 committed Nov 1, 2024
1 parent 6998af7 commit 367cc8d
Show file tree
Hide file tree
Showing 4 changed files with 16 additions and 12 deletions.
4 changes: 2 additions & 2 deletions docs/projA2/projA2.html
Original file line number Diff line number Diff line change
Expand Up @@ -357,8 +357,8 @@ <h3 class="anchored" data-anchor-id="wrong-number-of-lines-__-instead-of-__">“
</section>
<section id="numerical-overflow" class="level3">
<h3 class="anchored" data-anchor-id="numerical-overflow">Numerical Overflow</h3>
<p>This error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs <code>submission_df["Value"].describe()</code>, which returns some summary statistics of your predictions. Your maximum value for <code>Log Sale Price</code> should not be over 25.</p>
<p>For your reference, a log sale price of 25 corresponds to a sale price of <span class="math inline">\(e^{25} \approx\)</span> 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the <em>training</em> data or experimenting with new features so that your model generalizes better.</p>
<p>This error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs <code>submission_df["Value"].describe()</code>, which returns some summary statistics of your predictions. Your maximum value for <code>Log Sale Price</code> should not be over 25, and your minimum value should not be below 0.</p>
<p>For your reference, a log sale price of 25 corresponds to a sale price of <span class="math inline">\(e^{25} \approx\)</span> 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of <span class="math inline">\(e^{-1} \approx \$0.37\)</span>, which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the <em>training</em> data or experimenting with new features so that your model generalizes better.</p>


</section>
Expand Down
2 changes: 1 addition & 1 deletion docs/search.json
Original file line number Diff line number Diff line change
Expand Up @@ -414,7 +414,7 @@
"href": "projA2/projA2.html#gradescope",
"title": "Project A2 Common Questions",
"section": "Gradescope",
"text": "Gradescope\n\nI don’t have many Gradescope submissions left\nIf you’re almost out of Gradescope submissions, try using k-fold cross-validation to check the accuracy of your model. Results from cross-validation will be closer to the test set accuracy than results from the training data. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\n“Wrong number of lines ( __ instead of __ )”\nThis occurs when you remove outliers when preprocessing the testing data. Please do not remove any outliers from your test set. You may only remove outliers in training data.\n\n\nNumerical Overflow\nThis error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df[\"Value\"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25.\nFor your reference, a log sale price of 25 corresponds to a sale price of \\(e^{25} \\approx\\) 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.",
"text": "Gradescope\n\nI don’t have many Gradescope submissions left\nIf you’re almost out of Gradescope submissions, try using k-fold cross-validation to check the accuracy of your model. Results from cross-validation will be closer to the test set accuracy than results from the training data. Feel free to take a look at the code used in Lecture 16 if you’re confused on how to implement cross-validation.\n\n\n“Wrong number of lines ( __ instead of __ )”\nThis occurs when you remove outliers when preprocessing the testing data. Please do not remove any outliers from your test set. You may only remove outliers in training data.\n\n\nNumerical Overflow\nThis error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs submission_df[\"Value\"].describe(), which returns some summary statistics of your predictions. Your maximum value for Log Sale Price should not be over 25, and your minimum value should not be below 0.\nFor your reference, a log sale price of 25 corresponds to a sale price of \\(e^{25} \\approx\\) 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of \\(e^{-1} \\approx \\$0.37\\), which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the training data or experimenting with new features so that your model generalizes better.",
"crumbs": [
"<span class='chapter-number'>8</span>  <span class='chapter-title'>Project A2 Common Questions</span>"
]
Expand Down
18 changes: 11 additions & 7 deletions index.tex
Original file line number Diff line number Diff line change
Expand Up @@ -191,7 +191,7 @@ \chapter*{About}\label{about}

\chapter{Jupyter 101}\label{jupyter-101}

\begin{tcolorbox}[enhanced jigsaw, opacitybacktitle=0.6, left=2mm, colbacktitle=quarto-callout-note-color!10!white, opacityback=0, bottomtitle=1mm, toptitle=1mm, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, colframe=quarto-callout-note-color-frame, arc=.35mm, bottomrule=.15mm, rightrule=.15mm, breakable, titlerule=0mm, toprule=.15mm, leftrule=.75mm, colback=white, coltitle=black]
\begin{tcolorbox}[enhanced jigsaw, toptitle=1mm, opacitybacktitle=0.6, opacityback=0, colbacktitle=quarto-callout-note-color!10!white, arc=.35mm, breakable, bottomtitle=1mm, rightrule=.15mm, toprule=.15mm, left=2mm, colframe=quarto-callout-note-color-frame, colback=white, titlerule=0mm, coltitle=black, title=\textcolor{quarto-callout-note-color}{\faInfo}\hspace{0.5em}{Note}, leftrule=.75mm, bottomrule=.15mm]

If you're using a MacBook, replace \texttt{ctrl} with \texttt{cmd}.

Expand Down Expand Up @@ -1633,17 +1633,21 @@ \subsection{``Wrong number of lines ( \_\_ instead of \_\_
\subsection{Numerical Overflow}\label{numerical-overflow}
This error is caused by overly large predictions that create an
extremely large RMSE. The cell before you generate your submission runs
This error can be caused by negative predictions or overly large
predictions that create an extremely large RMSE. The cell before you
generate your submission runs
\texttt{submission\_df{[}"Value"{]}.describe()}, which returns some
summary statistics of your predictions. Your maximum value for
\texttt{Log\ Sale\ Price} should not be over 25.
\texttt{Log\ Sale\ Price} should not be over 25, and your minimum value
should not be below 0.
For your reference, a log sale price of 25 corresponds to a sale price
of \(e^{25} \approx\) 70 billion, which is far bigger than anything
found in the dataset. If you see such large predictions, you can try
removing outliers from the \emph{training} data or experimenting with
new features so that your model generalizes better.
found in the dataset. A log sale price of -1 corresponds to a sale price
of \(e^{-1} \approx \$0.37\), which is also not reasonable. If you see
such large or small predictions, you can try removing outliers from the
\emph{training} data or experimenting with new features so that your
model generalizes better.
Expand Down
4 changes: 2 additions & 2 deletions projA2/projA2.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,6 @@ This occurs when you remove outliers when preprocessing the testing data. *Pleas

### Numerical Overflow

This error is caused by overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs `submission_df["Value"].describe()`, which returns some summary statistics of your predictions. Your maximum value for `Log Sale Price` should not be over 25.
This error can be caused by negative predictions or overly large predictions that create an extremely large RMSE. The cell before you generate your submission runs `submission_df["Value"].describe()`, which returns some summary statistics of your predictions. Your maximum value for `Log Sale Price` should not be over 25, and your minimum value should not be below 0.

For your reference, a log sale price of 25 corresponds to a sale price of $e^{25} \approx$ 70 billion, which is far bigger than anything found in the dataset. If you see such large predictions, you can try removing outliers from the *training* data or experimenting with new features so that your model generalizes better.
For your reference, a log sale price of 25 corresponds to a sale price of $e^{25} \approx$ 70 billion, which is far bigger than anything found in the dataset. A log sale price of -1 corresponds to a sale price of $e^{-1} \approx \$0.37$, which is also not reasonable. If you see such large or small predictions, you can try removing outliers from the *training* data or experimenting with new features so that your model generalizes better.

0 comments on commit 367cc8d

Please sign in to comment.