Skip to content

Commit

Permalink
Tighten language around predictor/variable
Browse files Browse the repository at this point in the history
  • Loading branch information
mine-cetinkaya-rundel committed Oct 2, 2023
1 parent bad2e29 commit cdedcf3
Showing 1 changed file with 19 additions and 19 deletions.
38 changes: 19 additions & 19 deletions 08-model-mlr.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -506,9 +506,9 @@ However, when there is only $k = 1$ predictors, adjusted $R^2$ is very close to
## Model selection {#model-selection}
The best model is not always the most complicated.
Sometimes including variables that are not evidently important can actually reduce the accuracy of predictions.
In this section, we discuss model selection strategies, which will help us eliminate variables from the model that are found to be less important.
It's common (and hip, at least in the statistical world) to refer to models that have undergone such variable pruning as **parsimonious**.
Sometimes including predictors that are not evidently important can actually reduce the accuracy of predictions.
In this section, we discuss model selection strategies, which will help us eliminate predictors from the model that are found to be less important.
It's common (and hip, at least in the statistical world) to refer to models that have undergone such predictor pruning as **parsimonious**.
```{r}
#| include: false
Expand All @@ -525,27 +525,27 @@ terms_chp_8 <- c(terms_chp_8, "full model")
### Stepwise selection
Two common strategies for adding or removing variables in a multiple regression model are called backward elimination and forward selection.
Two common strategies for adding or removing predictors in a multiple regression model are called backward elimination and forward selection.
These techniques are often referred to as **stepwise selection** strategies, because they add or delete one variable at a time as they "step" through the candidate predictors.
```{r}
#| include: false
terms_chp_8 <- c(terms_chp_8, "stepwise selection")
```
**Backward elimination** starts with the full model (the model that includes all potential predictor variables. Variables are eliminated one-at-a-time from the model until we cannot improve the model any further.
**Backward elimination** starts with the full model (the model that includes all potential predictor variables. Predictors are eliminated one-at-a-time from the model until we cannot improve the model any further.
**Forward selection** is the reverse of the backward elimination technique.
Instead, of eliminating variables one-at-a-time, we add variables one-at-a-time until we cannot find any variables that improve the model any further.
Instead, of eliminating predictors one-at-a-time, we add predictors one-at-a-time until we cannot find any predictors that improve the model any further.
```{r}
#| include: false
terms_chp_8 <- c(terms_chp_8, "backward elimination", "forward selection")
```
An important consideration in implementing either of these stepwise selection strategies is the criterion used to decide whether to eliminate or add a variable.
An important consideration in implementing either of these stepwise selection strategies is the criterion used to decide whether to eliminate or add a predictors.
One commonly used decision criterion is adjusted $R^2$.
When using adjusted $R^2$ as the decision criterion, we seek to eliminate or add variables depending on whether they lead to the largest improvement in adjusted $R^2$ and we stop when adding or elimination of another variable does not lead to further improvement in adjusted $R^2$.
When using adjusted $R^2$ as the decision criterion, we seek to eliminate or add predictors depending on whether they lead to the largest improvement in adjusted $R^2$ and we stop when adding or elimination of another predictor does not lead to further improvement in adjusted $R^2$.
Adjusted $R^2$ describes the strength of a model fit, and it is a useful tool for evaluating which predictors are adding value to the model, where *adding value* means they are (likely) improving the accuracy in predicting future outcomes.
Expand Down Expand Up @@ -735,8 +735,8 @@ In this example, we have arrived at the same model that we identified from backw
::: {.important data-latex=""}
**Stepwise selection strategies.**
Backward elimination begins with the model having the largest number of predictors and eliminates variables one-by-one until we are satisfied that all remaining variables are important to the model.
Forward selection starts with no variables included in the model, then it adds in variables according to their importance until no other important variables are found.
Backward elimination begins with the model having the largest number of predictors and eliminates predictors one-by-one until we are satisfied that all remaining predictors are important to the model.
Forward selection starts with no predictors included in the model, then it adds in predictors according to their importance until no other important predictors are found.
Notice that, for both methods, we have always chosen to retain the model with the largest adjusted $R^2$ value, even if the difference is less than half a percent (e.g., 0.2597 versus 0.2598).
One could argue that the difference between these two models is negligible, as they both explain nearly the same amount of variability in the `interest_rate`.
These negligible differences are an important aspect to model selection.
Expand All @@ -746,10 +746,10 @@ This "threshold" is what you will then use to decide if one model is "better" th
Using meaningful thresholds in model selection requires more critical thinking about what the adjusted $R^2$ values mean.
Additionally, backward elimination and forward selection sometimes arrive at different final models.
This is because the decision for whether to include a given variable or not depends on the other variables that are already in the model.
With forward selection, you start with a model that includes no variables and add variables one at a time.
In backward elimination, you start with a model that includes all of the variables and remove variables one at a time.
How much a given variable changes the percentage of the variability in the outcome that is explained by the model depends on what other variables are in the model.
This is because the decision for whether to include a given predictor or not depends on the other predictors that are already in the model.
With forward selection, you start with a model that includes no predictors and add predictors one at a time.
In backward elimination, you start with a model that includes all of the potential predictors and remove predictors one at a time.
How much a given predictor changes the percentage of the variability in the outcome that is explained by the model depends on what other predictors are in the model.
This is especially important if the predictor variables are correlated with each other.
There is no "one size fits all" model selection strategy, which is why there are so many different model selection methods.
Expand All @@ -761,8 +761,8 @@ We hope you walk away from this exploration understanding how stepwise selection
Stepwise selection using adjusted $R^2$ as the decision criteria is one of many commonly used model selection strategies.
Stepwise selection can also be carried out with decision criteria other than adjusted $R^2$, such as p-values, which you'll learn about in @sec-inf-model-slr onward, or AIC (Akaike information criterion) or BIC (Bayesian information criterion), which you might learn about in more advanced courses.
Alternatively, one could choose to include or exclude variables from a model based on expert opinion or due to research focus.
In fact, many statisticians discourage the use of stepwise regression alone for model selection and advocate, instead, for a more thoughtful approach that carefully considers the research focus and features of the data.
Alternatively, one could choose to include or exclude predictors from a model based on expert opinion or due to research focus.
In fact, many statisticians discourage the use of stepwise regression *alone* for model selection and advocate, instead, for a more thoughtful approach that carefully considers the research focus and features of the data.
\clearpage
Expand All @@ -772,10 +772,10 @@ In fact, many statisticians discourage the use of stepwise regression alone for
With real data, there is often a need to describe how multiple variables can be modeled together.
In this chapter, we have presented one approach using multiple linear regression.
Each coefficient represents a one unit increase of that predictor variable on the response variable *given* the rest of the predictor variables in the model.
Each coefficient represents how the model predicts the outcome might change with one unit increase of that predictor *given* the rest of the predictor variables in the model.
Working with and interpreting multivariable models can be tricky, especially when the predictor variables show multicollinearity.
There is often no perfect or "right" final model, but using the adjusted $R^2$ value is one way to identify important predictor variables for a final regression model.
In later chapters we will generalize multiple linear regression models to a larger population of interest from which the dataset was generated.
There is often no perfect or "right" final model, however, using the adjusted $R^2$ value is one way to identify important predictor variables for a final regression model.
In later chapters we will generalize multiple linear regression models to a larger population of interest from which the dataset was sampled.
### Terms
Expand Down

0 comments on commit cdedcf3

Please sign in to comment.