ols_regress(mpg~disp+hp+wt, data =mtcars)#> Model Summary #> ---------------------------------------------------------------#> R 0.909 RMSE 2.468
@@ -243,9 +243,9 @@
# using grouping variable
+if(require("descriptr")){
+library(descriptr)
+ols_test_bartlett(mtcarz, 'mpg', group_var ='cyl')
+}
+#> Loading required package: descriptr
+#>
+#> Attaching package: ‘descriptr’
+#> The following object is masked _by_ ‘.GlobalEnv’:#>
-#> Attaching package: 'descriptr'
-#> The following object is masked from 'package:olsrr':
+#> hsb
+#> The following object is masked from ‘package:olsrr’:#>#> hsb
-ols_test_bartlett(mtcarz, 'mpg', group_var ='cyl')#>#> Bartlett's Test of Homogenity of Variances #> ------------------------------------------------
@@ -144,9 +150,9 @@
-
+
diff --git a/docs/reference/surgical.html b/docs/reference/surgical.html
index 06cac30..63d3269 100644
--- a/docs/reference/surgical.html
+++ b/docs/reference/surgical.html
@@ -1,5 +1,5 @@
-Surgical Unit Data Set — surgical • olsrrSurgical Unit Data Set — surgical • olsrr
@@ -67,7 +67,7 @@
Usage
-
surgical
+
surgical
@@ -75,28 +75,28 @@
Format<
A data frame with 54 rows and 9 variables:
bcs
blood clotting score
-
pindex
+
pindex
prognostic index
-
enzyme_test
+
enzyme_test
enzyme function test score
-
liver_test
+
liver_test
liver function test score
-
age
+
age
age, in years
-
gender
+
gender
indicator variable for gender (0 = male, 1 = female)
-
alc_mod
+
alc_mod
indicator variable for history of alcohol use (0 = None, 1 = Moderate)
-
alc_heavy
+
alc_heavy
indicator variable for history of alcohol use (0 = None, 1 = Heavy)
-
y
+
y
Survival Time
@@ -116,7 +116,7 @@
Source<
diff --git a/man/ols_coll_diag.Rd b/man/ols_coll_diag.Rd
index 38cce28..fa769fb 100644
--- a/man/ols_coll_diag.Rd
+++ b/man/ols_coll_diag.Rd
@@ -39,9 +39,9 @@ Percent of variance in the predictor that cannot be accounted for by other predi
Steps to calculate tolerance:
\itemize{
- \item Regress the kth predictor on rest of the predictors in the model.
- \item Compute \eqn{R^2} - the coefficient of determination from the regression in the above step.
- \item \eqn{Tolerance = 1 - R^2}
+\item Regress the kth predictor on rest of the predictors in the model.
+\item Compute \eqn{R^2} - the coefficient of determination from the regression in the above step.
+\item \eqn{Tolerance = 1 - R^2}
}
\emph{Variance Inflation Factor}
@@ -57,9 +57,9 @@ requiring correction.
Steps to calculate VIF:
\itemize{
- \item Regress the kth predictor on rest of the predictors in the model.
- \item Compute \eqn{R^2} - the coefficient of determination from the regression in the above step.
- \item \eqn{Tolerance = 1 / 1 - R^2 = 1 / Tolerance}
+\item Regress the kth predictor on rest of the predictors in the model.
+\item Compute \eqn{R^2} - the coefficient of determination from the regression in the above step.
+\item \eqn{Tolerance = 1 / 1 - R^2 = 1 / Tolerance}
}
\emph{Condition Index}
diff --git a/man/ols_plot_added_variable.Rd b/man/ols_plot_added_variable.Rd
index 41e3af5..3f4c61d 100644
--- a/man/ols_plot_added_variable.Rd
+++ b/man/ols_plot_added_variable.Rd
@@ -29,9 +29,9 @@ model. Let the response variable of the model be \emph{Y}
Steps to construct an added variable plot:
\itemize{
- \item Regress \emph{Y} on all variables other than \emph{X} and store the residuals (\emph{Y} residuals).
- \item Regress \emph{X} on all the other variables included in the model (\emph{X} residuals).
- \item Construct a scatter plot of \emph{Y} residuals and \emph{X} residuals.
+\item Regress \emph{Y} on all variables other than \emph{X} and store the residuals (\emph{Y} residuals).
+\item Regress \emph{X} on all the other variables included in the model (\emph{X} residuals).
+\item Construct a scatter plot of \emph{Y} residuals and \emph{X} residuals.
}
What do the \emph{Y} and \emph{X} residuals represent? The \emph{Y} residuals represent the part
@@ -56,5 +56,5 @@ Kutner, MH, Nachtscheim CJ, Neter J and Li W., 2004, Applied Linear Statistical
Chicago, IL., McGraw Hill/Irwin.
}
\seealso{
-[ols_plot_resid_regressor()], [ols_plot_comp_plus_resid()]
+\code{\link[=ols_plot_resid_regressor]{ols_plot_resid_regressor()}}, \code{\link[=ols_plot_comp_plus_resid]{ols_plot_comp_plus_resid()}}
}
diff --git a/man/ols_plot_comp_plus_resid.Rd b/man/ols_plot_comp_plus_resid.Rd
index 18cbe6a..1122dce 100644
--- a/man/ols_plot_comp_plus_resid.Rd
+++ b/man/ols_plot_comp_plus_resid.Rd
@@ -28,5 +28,5 @@ Kutner, MH, Nachtscheim CJ, Neter J and Li W., 2004, Applied Linear Statistical
Chicago, IL., McGraw Hill/Irwin.
}
\seealso{
-[ols_plot_added_variable()], [ols_plot_resid_regressor()]
+\code{\link[=ols_plot_added_variable]{ols_plot_added_variable()}}, \code{\link[=ols_plot_resid_regressor]{ols_plot_resid_regressor()}}
}
diff --git a/man/ols_plot_cooksd_bar.Rd b/man/ols_plot_cooksd_bar.Rd
index 1082e73..9020afe 100644
--- a/man/ols_plot_cooksd_bar.Rd
+++ b/man/ols_plot_cooksd_bar.Rd
@@ -4,14 +4,18 @@
\alias{ols_plot_cooksd_bar}
\title{Cooks' D bar plot}
\usage{
-ols_plot_cooksd_bar(model, type = 1, print_plot = TRUE)
+ols_plot_cooksd_bar(model, type = 1, threshold = NULL, print_plot = TRUE)
}
\arguments{
\item{model}{An object of class \code{lm}.}
-\item{type}{An integer between 1 and 5 selecting one of the 6 methods for computing the threshold.}
+\item{type}{An integer between 1 and 5 selecting one of the 5 methods for
+computing the threshold.}
-\item{print_plot}{logical; if \code{TRUE}, prints the plot else returns a plot object.}
+\item{threshold}{Threshold for detecting outliers.}
+
+\item{print_plot}{logical; if \code{TRUE}, prints the plot else returns a
+plot object.}
}
\value{
\code{ols_plot_cooksd_bar} returns a list containing the
@@ -33,9 +37,9 @@ residual and leverage i.e it takes it account both the \emph{x} value and
Steps to compute Cook's distance:
\itemize{
- \item Delete observations one at a time.
- \item Refit the regression model on remaining \eqn{n - 1} observations
- \item examine how much all of the fitted values change when the ith observation is deleted.
+\item Delete observations one at a time.
+\item Refit the regression model on remaining \eqn{n - 1} observations
+\item examine how much all of the fitted values change when the ith observation is deleted.
}
A data point having a large cook's d indicates that the data point strongly
@@ -44,25 +48,27 @@ the threshold used for detecting or classifying observations as outliers and
we list them below.
\itemize{
- \item \strong{Type 1} : 4 / n
- \item \strong{Type 2} : 4 / (n - k - 1)
- \item \strong{Type 3} : ~1
- \item \strong{Type 4} : 1 / (n - k - 1)
- \item \strong{Type 5} : 3 * mean(Vector of cook's distance values)
+\item \strong{Type 1} : 4 / n
+\item \strong{Type 2} : 4 / (n - k - 1)
+\item \strong{Type 3} : ~1
+\item \strong{Type 4} : 1 / (n - k - 1)
+\item \strong{Type 5} : 3 * mean(Vector of cook's distance values)
}
where \strong{n} and \strong{k} stand for
\itemize{
- \item \strong{n}: Number of observations
- \item \strong{k}: Number of predictors
+\item \strong{n}: Number of observations
+\item \strong{k}: Number of predictors
}
}
\examples{
model <- lm(mpg ~ disp + hp + wt, data = mtcars)
ols_plot_cooksd_bar(model)
+ols_plot_cooksd_bar(model, type = 4)
+ols_plot_cooksd_bar(model, threshold = 0.2)
}
\seealso{
-[ols_plot_cooksd_chart()]
+\code{\link[=ols_plot_cooksd_chart]{ols_plot_cooksd_chart()}}
}
diff --git a/man/ols_plot_cooksd_chart.Rd b/man/ols_plot_cooksd_chart.Rd
index c61d98e..4e90117 100644
--- a/man/ols_plot_cooksd_chart.Rd
+++ b/man/ols_plot_cooksd_chart.Rd
@@ -4,13 +4,15 @@
\alias{ols_plot_cooksd_chart}
\title{Cooks' D chart}
\usage{
-ols_plot_cooksd_chart(model, type = 1, print_plot = TRUE)
+ols_plot_cooksd_chart(model, type = 1, threshold = NULL, print_plot = TRUE)
}
\arguments{
\item{model}{An object of class \code{lm}.}
\item{type}{An integer between 1 and 5 selecting one of the 6 methods for computing the threshold.}
+\item{threshold}{Threshold for detecting outliers.}
+
\item{print_plot}{logical; if \code{TRUE}, prints the plot else returns a plot object.}
}
\value{
@@ -33,9 +35,9 @@ residual and leverage i.e it takes it account both the \emph{x} value and
Steps to compute Cook's distance:
\itemize{
- \item Delete observations one at a time.
- \item Refit the regression model on remaining \eqn{n - 1} observations
- \item exmine how much all of the fitted values change when the ith observation is deleted.
+\item Delete observations one at a time.
+\item Refit the regression model on remaining \eqn{n - 1} observations
+\item exmine how much all of the fitted values change when the ith observation is deleted.
}
A data point having a large cook's d indicates that the data point strongly
@@ -44,25 +46,27 @@ the threshold used for detecting or classifying observations as outliers
and we list them below.
\itemize{
- \item \strong{Type 1} : 4 / n
- \item \strong{Type 2} : 4 / (n - k - 1)
- \item \strong{Type 3} : ~1
- \item \strong{Type 4} : 1 / (n - k - 1)
- \item \strong{Type 5} : 3 * mean(Vector of cook's distance values)
+\item \strong{Type 1} : 4 / n
+\item \strong{Type 2} : 4 / (n - k - 1)
+\item \strong{Type 3} : ~1
+\item \strong{Type 4} : 1 / (n - k - 1)
+\item \strong{Type 5} : 3 * mean(Vector of cook's distance values)
}
where \strong{n} and \strong{k} stand for
\itemize{
- \item \strong{n}: Number of observations
- \item \strong{k}: Number of predictors
+\item \strong{n}: Number of observations
+\item \strong{k}: Number of predictors
}
}
\examples{
model <- lm(mpg ~ disp + hp + wt, data = mtcars)
ols_plot_cooksd_chart(model)
+ols_plot_cooksd_chart(model, type = 4)
+ols_plot_cooksd_chart(model, threshold = 0.2)
}
\seealso{
-[ols_plot_cooksd_bar()]
+\code{\link[=ols_plot_cooksd_bar]{ols_plot_cooksd_bar()}}
}
diff --git a/man/ols_plot_dfbetas.Rd b/man/ols_plot_dfbetas.Rd
index 1ee24d1..944b594 100644
--- a/man/ols_plot_dfbetas.Rd
+++ b/man/ols_plot_dfbetas.Rd
@@ -41,5 +41,5 @@ Wiley Series in Probability and Mathematical Statistics.
New York: John Wiley & Sons. pp. ISBN 0-471-05856-4.
}
\seealso{
-[ols_plot_dffits()]
+\code{\link[=ols_plot_dffits]{ols_plot_dffits()}}
}
diff --git a/man/ols_plot_dffits.Rd b/man/ols_plot_dffits.Rd
index 29475b1..d245008 100644
--- a/man/ols_plot_dffits.Rd
+++ b/man/ols_plot_dffits.Rd
@@ -9,10 +9,10 @@ ols_plot_dffits(model, size_adj_threshold = TRUE, print_plot = TRUE)
\arguments{
\item{model}{An object of class \code{lm}.}
-\item{size_adj_threshold}{logical; if \code{TRUE} (the default), size
+\item{size_adj_threshold}{logical; if \code{TRUE} (the default), size
adjusted threshold is used to determine influential observations.}
-\item{print_plot}{logical; if \code{TRUE}, prints the plot else returns a
+\item{print_plot}{logical; if \code{TRUE}, prints the plot else returns a
plot object.}
}
\value{
@@ -33,16 +33,16 @@ when the ith data point is omitted.
Steps to compute DFFITs:
\itemize{
- \item Delete observations one at a time.
- \item Refit the regression model on remaining \eqn{n - 1} observations
- \item examine how much all of the fitted values change when the ith observation is deleted.
+\item Delete observations one at a time.
+\item Refit the regression model on remaining \eqn{n - 1} observations
+\item examine how much all of the fitted values change when the ith observation is deleted.
}
An observation is deemed influential if the absolute value of its DFFITS value is greater than:
\deqn{2\sqrt((p + 1) / (n - p -1))}
-A size-adjusted cutoff recommended by Belsley, Kuh, and Welsch is
-\deqn{2\sqrt(p / n)} and is used by default in **olsrr**.
+A size-adjusted cutoff recommended by Belsley, Kuh, and Welsch is
+\deqn{2\sqrt(p / n)} and is used by default in \strong{olsrr}.
where \code{n} is the number of observations and \code{p} is the number of predictors including intercept.
}
@@ -60,5 +60,5 @@ Wiley Series in Probability and Mathematical Statistics.
New York: John Wiley & Sons. ISBN 0-471-05856-4.
}
\seealso{
-[ols_plot_dfbetas()]
+\code{\link[=ols_plot_dfbetas]{ols_plot_dfbetas()}}
}
diff --git a/man/ols_plot_hadi.Rd b/man/ols_plot_hadi.Rd
index 965fb60..95b0107 100644
--- a/man/ols_plot_hadi.Rd
+++ b/man/ols_plot_hadi.Rd
@@ -25,5 +25,5 @@ ols_plot_hadi(model)
Chatterjee, Samprit and Hadi, Ali. Regression Analysis by Example. 5th ed. N.p.: John Wiley & Sons, 2012. Print.
}
\seealso{
-[ols_plot_resid_pot()]
+\code{\link[=ols_plot_resid_pot]{ols_plot_resid_pot()}}
}
diff --git a/man/ols_plot_resid_fit.Rd b/man/ols_plot_resid_fit.Rd
index 1a038e2..48483d5 100644
--- a/man/ols_plot_resid_fit.Rd
+++ b/man/ols_plot_resid_fit.Rd
@@ -19,9 +19,9 @@ x axis to detect non-linearity, unequal error variances, and outliers.
Characteristics of a well behaved residual vs fitted plot:
\itemize{
- \item The residuals spread randomly around the 0 line indicating that the relationship is linear.
- \item The residuals form an approximate horizontal band around the 0 line indicating homogeneity of error variance.
- \item No one residual is visibly away from the random pattern of the residuals indicating that there are no outliers.
+\item The residuals spread randomly around the 0 line indicating that the relationship is linear.
+\item The residuals form an approximate horizontal band around the 0 line indicating homogeneity of error variance.
+\item No one residual is visibly away from the random pattern of the residuals indicating that there are no outliers.
}
}
\examples{
diff --git a/man/ols_plot_resid_lev.Rd b/man/ols_plot_resid_lev.Rd
index 666d81e..8e9600b 100644
--- a/man/ols_plot_resid_lev.Rd
+++ b/man/ols_plot_resid_lev.Rd
@@ -23,5 +23,5 @@ ols_plot_resid_lev(model, threshold = 3)
}
\seealso{
-[ols_plot_resid_stud_fit()], [ols_plot_resid_lev()]
+\code{\link[=ols_plot_resid_stud_fit]{ols_plot_resid_stud_fit()}}, \code{\link[=ols_plot_resid_lev]{ols_plot_resid_lev()}}
}
diff --git a/man/ols_plot_resid_pot.Rd b/man/ols_plot_resid_pot.Rd
index a6889c2..e043eb8 100644
--- a/man/ols_plot_resid_pot.Rd
+++ b/man/ols_plot_resid_pot.Rd
@@ -24,5 +24,5 @@ ols_plot_resid_pot(model)
Chatterjee, Samprit and Hadi, Ali. Regression Analysis by Example. 5th ed. N.p.: John Wiley & Sons, 2012. Print.
}
\seealso{
-[ols_plot_hadi()]
+\code{\link[=ols_plot_hadi]{ols_plot_hadi()}}
}
diff --git a/man/ols_plot_resid_regressor.Rd b/man/ols_plot_resid_regressor.Rd
index 15f8370..cee5003 100644
--- a/man/ols_plot_resid_regressor.Rd
+++ b/man/ols_plot_resid_regressor.Rd
@@ -25,5 +25,5 @@ ols_plot_resid_regressor(model, 'drat')
}
\seealso{
-[ols_plot_added_variable()], [ols_plot_comp_plus_resid()]
+\code{\link[=ols_plot_added_variable]{ols_plot_added_variable()}}, \code{\link[=ols_plot_comp_plus_resid]{ols_plot_comp_plus_resid()}}
}
diff --git a/man/ols_plot_resid_stand.Rd b/man/ols_plot_resid_stand.Rd
index 0ded501..3dce89e 100644
--- a/man/ols_plot_resid_stand.Rd
+++ b/man/ols_plot_resid_stand.Rd
@@ -35,5 +35,5 @@ ols_plot_resid_stand(model, threshold = 3)
}
\seealso{
-[ols_plot_resid_stud()]
+\code{\link[=ols_plot_resid_stud]{ols_plot_resid_stud()}}
}
diff --git a/man/ols_plot_resid_stud.Rd b/man/ols_plot_resid_stud.Rd
index 8a461c5..aa81fc9 100644
--- a/man/ols_plot_resid_stud.Rd
+++ b/man/ols_plot_resid_stud.Rd
@@ -39,5 +39,5 @@ ols_plot_resid_stud(model, threshold = 2)
}
\seealso{
-[ols_plot_resid_stand()]
+\code{\link[=ols_plot_resid_stand]{ols_plot_resid_stand()}}
}
diff --git a/man/ols_plot_resid_stud_fit.Rd b/man/ols_plot_resid_stud_fit.Rd
index dcb8008..3fe8418 100644
--- a/man/ols_plot_resid_stud_fit.Rd
+++ b/man/ols_plot_resid_stud_fit.Rd
@@ -42,6 +42,6 @@ ols_plot_resid_stud_fit(model, threshold = 3)
}
\seealso{
-[ols_plot_resid_lev()], [ols_plot_resid_stand()],
- [ols_plot_resid_stud()]
+\code{\link[=ols_plot_resid_lev]{ols_plot_resid_lev()}}, \code{\link[=ols_plot_resid_stand]{ols_plot_resid_stand()}},
+\code{\link[=ols_plot_resid_stud]{ols_plot_resid_stud()}}
}
diff --git a/man/ols_pure_error_anova.Rd b/man/ols_pure_error_anova.Rd
index 5cb7950..d758262 100644
--- a/man/ols_pure_error_anova.Rd
+++ b/man/ols_pure_error_anova.Rd
@@ -46,8 +46,8 @@ The residual sum of squares resulting from a regression can be decomposed
into 2 components:
\itemize{
- \item Due to lack of fit
- \item Due to random variation
+\item Due to lack of fit
+\item Due to random variation
}
If most of the error is due to lack of fit and not just random error, the
diff --git a/man/ols_regress.Rd b/man/ols_regress.Rd
index 3047154..67df88e 100644
--- a/man/ols_regress.Rd
+++ b/man/ols_regress.Rd
@@ -73,5 +73,5 @@ ols_regress(mpg ~ disp * wt, data = mtcars, iterm = TRUE)
}
\references{
-https://www.ssc.wisc.edu/~hemken/Stataworkshops/stdBeta/Getting%20Standardized%20Coefficients%20Right.pdf
+https://www.ssc.wisc.edu/~hemken/Stataworkshops/stdBeta/Getting\%20Standardized\%20Coefficients\%20Right.pdf
}
diff --git a/man/ols_test_bartlett.Rd b/man/ols_test_bartlett.Rd
index 674ca69..938d4bb 100644
--- a/man/ols_test_bartlett.Rd
+++ b/man/ols_test_bartlett.Rd
@@ -35,8 +35,10 @@ is an alternative test that is less sensitive to departures from normality.
}
\examples{
# using grouping variable
-library(descriptr)
-ols_test_bartlett(mtcarz, 'mpg', group_var = 'cyl')
+if (require("descriptr")) {
+ library(descriptr)
+ ols_test_bartlett(mtcarz, 'mpg', group_var = 'cyl')
+}
# using variables
ols_test_bartlett(hsb, 'read', 'write')
diff --git a/man/ols_test_breusch_pagan.Rd b/man/ols_test_breusch_pagan.Rd
index 2c35ced..6faea8b 100644
--- a/man/ols_test_breusch_pagan.Rd
+++ b/man/ols_test_breusch_pagan.Rd
@@ -62,11 +62,11 @@ values of a independent variable.
Computation
\itemize{
- \item Fit a regression model
- \item Regress the squared residuals from the above model on the independent variables
- \item Compute \eqn{nR^2}. It follows a chi square distribution with p -1 degrees of
- freedom, where p is the number of independent variables, n is the sample size and
- \eqn{R^2} is the coefficient of determination from the regression in step 2.
+\item Fit a regression model
+\item Regress the squared residuals from the above model on the independent variables
+\item Compute \eqn{nR^2}. It follows a chi square distribution with p -1 degrees of
+freedom, where p is the number of independent variables, n is the sample size and
+\eqn{R^2} is the coefficient of determination from the regression in step 2.
}
}
\examples{
diff --git a/man/olsrr.Rd b/man/olsrr.Rd
index a8a262f..98e7e81 100644
--- a/man/olsrr.Rd
+++ b/man/olsrr.Rd
@@ -3,6 +3,8 @@
\docType{package}
\name{olsrr}
\alias{olsrr}
+\alias{_PACKAGE}
+\alias{olsrr-package}
\title{\code{olsrr} package}
\description{
Tools for teaching and learning OLS regression
@@ -10,5 +12,18 @@ Tools for teaching and learning OLS regression
\details{
See the README on
\href{https://github.com/rsquaredacademy/olsrr}{GitHub}
+}
+\seealso{
+Useful links:
+\itemize{
+ \item \url{https://olsrr.rsquaredacademy.com/}
+ \item \url{https://github.com/rsquaredacademy/olsrr}
+ \item Report bugs at \url{https://github.com/rsquaredacademy/olsrr/issues}
+}
+
+}
+\author{
+\strong{Maintainer}: Aravind Hebbali \email{hebbali.aravind@gmail.com}
+
}
\keyword{internal}
diff --git a/man/surgical.Rd b/man/surgical.Rd
index a4bfef0..66295d4 100644
--- a/man/surgical.Rd
+++ b/man/surgical.Rd
@@ -7,15 +7,15 @@
\format{
A data frame with 54 rows and 9 variables:
\describe{
- \item{bcs}{blood clotting score}
- \item{pindex}{prognostic index}
- \item{enzyme_test}{enzyme function test score}
- \item{liver_test}{liver function test score}
- \item{age}{age, in years}
- \item{gender}{indicator variable for gender (0 = male, 1 = female)}
- \item{alc_mod}{indicator variable for history of alcohol use (0 = None, 1 = Moderate)}
- \item{alc_heavy}{indicator variable for history of alcohol use (0 = None, 1 = Heavy)}
- \item{y}{Survival Time}
+\item{bcs}{blood clotting score}
+\item{pindex}{prognostic index}
+\item{enzyme_test}{enzyme function test score}
+\item{liver_test}{liver function test score}
+\item{age}{age, in years}
+\item{gender}{indicator variable for gender (0 = male, 1 = female)}
+\item{alc_mod}{indicator variable for history of alcohol use (0 = None, 1 = Moderate)}
+\item{alc_heavy}{indicator variable for history of alcohol use (0 = None, 1 = Heavy)}
+\item{y}{Survival Time}
}
}
\source{
diff --git a/revdep/.gitignore b/revdep/.gitignore
new file mode 100644
index 0000000..111ab32
--- /dev/null
+++ b/revdep/.gitignore
@@ -0,0 +1,7 @@
+checks
+library
+checks.noindex
+library.noindex
+cloud.noindex
+data.sqlite
+*.html
diff --git a/revdep/README.md b/revdep/README.md
new file mode 100644
index 0000000..c0c7e87
--- /dev/null
+++ b/revdep/README.md
@@ -0,0 +1,112 @@
+# Platform
+
+|field |value |
+|:--------|:-----------------------------------|
+|version |R version 4.3.2 (2023-10-31 ucrt) |
+|os |Windows 10 x64 (build 19045) |
+|system |x86_64, mingw32 |
+|ui |RStudio |
+|language |(EN) |
+|collate |English_India.utf8 |
+|ctype |en_US.UTF-8 |
+|tz |Asia/Calcutta |
+|date |2024-02-12 |
+|rstudio |2023.12.1+402 Ocean Storm (desktop) |
+|pandoc |NA |
+
+# Dependencies
+
+|package |old |new |Δ |
+|:------------|:----------|:----------|:--|
+|olsrr |0.5.3 |0.6.0 |* |
+|abind |1.4-5 |1.4-5 | |
+|backports |1.4.1 |1.4.1 | |
+|base64enc |NA |0.1-3 |* |
+|brio |1.1.4 |1.1.4 | |
+|broom |1.0.5 |1.0.5 | |
+|bslib |NA |0.6.1 |* |
+|cachem |NA |1.0.8 |* |
+|callr |3.7.3 |3.7.3 | |
+|car |3.1-2 |3.1-2 | |
+|carData |3.0-5 |3.0-5 | |
+|cli |3.6.2 |3.6.2 | |
+|colorspace |2.1-0 |2.1-0 | |
+|commonmark |NA |1.9.1 |* |
+|cpp11 |0.4.7 |0.4.7 | |
+|crayon |1.5.2 |1.5.2 | |
+|data.table |1.15.0 |NA |* |
+|desc |1.4.3 |1.4.3 | |
+|diffobj |0.3.5 |0.3.5 | |
+|digest |0.6.34 |0.6.34 | |
+|dplyr |1.1.4 |1.1.4 | |
+|ellipsis |0.3.2 |0.3.2 | |
+|evaluate |0.23 |0.23 | |
+|fansi |1.0.6 |1.0.6 | |
+|farver |2.1.1 |2.1.1 | |
+|fastmap |NA |1.1.1 |* |
+|fontawesome |NA |0.5.2 |* |
+|fs |1.6.3 |1.6.3 | |
+|generics |0.1.3 |0.1.3 | |
+|ggplot2 |3.4.4 |3.4.4 | |
+|glue |1.7.0 |1.7.0 | |
+|goftest |1.2-3 |1.2-3 | |
+|gridExtra |2.3 |2.3 | |
+|gtable |0.3.4 |0.3.4 | |
+|htmltools |NA |0.5.7 |* |
+|httpuv |NA |1.6.14 |* |
+|isoband |0.2.7 |0.2.7 | |
+|jquerylib |NA |0.1.4 |* |
+|jsonlite |1.8.8 |1.8.8 | |
+|labeling |0.4.3 |0.4.3 | |
+|later |NA |1.3.2 |* |
+|lifecycle |1.0.4 |1.0.4 | |
+|lme4 |1.1-35.1 |1.1-35.1 | |
+|magrittr |2.0.3 |2.0.3 | |
+|MatrixModels |0.5-3 |0.5-3 | |
+|memoise |NA |2.0.1 |* |
+|mime |NA |0.12 |* |
+|minqa |1.2.6 |1.2.6 | |
+|munsell |0.5.0 |0.5.0 | |
+|nloptr |2.0.3 |2.0.3 | |
+|nortest |1.0-4 |1.0-4 | |
+|numDeriv |2016.8-1.1 |2016.8-1.1 | |
+|pbkrtest |0.5.2 |0.5.2 | |
+|pillar |1.9.0 |1.9.0 | |
+|pkgbuild |1.4.3 |1.4.3 | |
+|pkgconfig |2.0.3 |2.0.3 | |
+|pkgload |1.3.4 |1.3.4 | |
+|praise |1.0.0 |1.0.0 | |
+|processx |3.8.3 |3.8.3 | |
+|promises |NA |1.2.1 |* |
+|ps |1.7.6 |1.7.6 | |
+|purrr |1.0.2 |1.0.2 | |
+|quantreg |5.97 |5.97 | |
+|R6 |2.5.1 |2.5.1 | |
+|rappdirs |NA |0.3.3 |* |
+|RColorBrewer |1.1-3 |1.1-3 | |
+|Rcpp |1.0.12 |1.0.12 | |
+|RcppEigen |0.3.3.9.4 |0.3.3.9.4 | |
+|rematch2 |2.1.2 |2.1.2 | |
+|rlang |1.1.3 |1.1.3 | |
+|rprojroot |2.0.4 |2.0.4 | |
+|sass |NA |0.4.8 |* |
+|scales |1.3.0 |1.3.0 | |
+|shiny |NA |1.8.0 |* |
+|sourcetools |NA |0.1.7-1 |* |
+|SparseM |1.81 |1.81 | |
+|stringi |1.8.3 |1.8.3 | |
+|stringr |1.5.1 |1.5.1 | |
+|testthat |3.2.1 |3.2.1 | |
+|tibble |3.2.1 |3.2.1 | |
+|tidyr |1.3.1 |1.3.1 | |
+|tidyselect |1.2.0 |1.2.0 | |
+|utf8 |1.2.4 |1.2.4 | |
+|vctrs |0.6.5 |0.6.5 | |
+|viridisLite |0.4.2 |0.4.2 | |
+|waldo |0.5.2 |0.5.2 | |
+|withr |3.0.0 |3.0.0 | |
+|xplorerr |NA |0.1.2 |* |
+|xtable |NA |1.8-4 |* |
+
+# Revdeps
+
diff --git a/revdep/cran.md b/revdep/cran.md
new file mode 100644
index 0000000..ab1853c
--- /dev/null
+++ b/revdep/cran.md
@@ -0,0 +1,7 @@
+## revdepcheck results
+
+We checked 4 reverse dependencies, comparing R CMD check results across CRAN and dev versions of this package.
+
+ * We saw 0 new problems
+ * We failed to check 0 packages
+
diff --git a/revdep/failures.md b/revdep/failures.md
new file mode 100644
index 0000000..9a20736
--- /dev/null
+++ b/revdep/failures.md
@@ -0,0 +1 @@
+*Wow, no problems at all. :)*
\ No newline at end of file
diff --git a/revdep/problems.md b/revdep/problems.md
new file mode 100644
index 0000000..9a20736
--- /dev/null
+++ b/revdep/problems.md
@@ -0,0 +1 @@
+*Wow, no problems at all. :)*
\ No newline at end of file
diff --git a/tests/testthat/_snaps/visual/cooks-d-bar-chart-threshold.svg b/tests/testthat/_snaps/visual/cooks-d-bar-chart-threshold.svg
new file mode 100644
index 0000000..0314202
--- /dev/null
+++ b/tests/testthat/_snaps/visual/cooks-d-bar-chart-threshold.svg
@@ -0,0 +1,122 @@
+
+
diff --git a/tests/testthat/_snaps/visual/cooks-d-bar-plot-threshold.svg b/tests/testthat/_snaps/visual/cooks-d-bar-plot-threshold.svg
new file mode 100644
index 0000000..839a573
--- /dev/null
+++ b/tests/testthat/_snaps/visual/cooks-d-bar-plot-threshold.svg
@@ -0,0 +1,99 @@
+
+
diff --git a/tests/testthat/test-bartlett.R b/tests/testthat/test-bartlett.R
index d3ee598..47b7ea2 100644
--- a/tests/testthat/test-bartlett.R
+++ b/tests/testthat/test-bartlett.R
@@ -1,4 +1,4 @@
-test_that("all output from the test match the result", {
+test_that("output from the test match the result when using variables", {
b <- ols_test_bartlett(mtcars, 'mpg', 'disp')
@@ -8,14 +8,18 @@ test_that("all output from the test match the result", {
expect_equal(b$var_c, c("mpg", "disp"), ignore_attr = TRUE)
expect_null(b$g_var)
- b <- ols_test_bartlett(descriptr::mtcarz, 'mpg', group_var = 'vs')
+})
- expect_equal(round(b$fstat, 3), 1.585)
- expect_equal(round(b$pval, 3), 0.208)
- expect_equal(b$df, 1)
- expect_equal(b$var_c, "mpg")
- expect_equal(b$g_var, "vs")
+test_that("output from test match the result when using grouping variables", {
+ if (requireNamespace("descriptr", quietly = TRUE)) {
+ b <- ols_test_bartlett(descriptr::mtcarz, 'mpg', group_var = 'vs')
+ expect_equal(round(b$fstat, 3), 1.585)
+ expect_equal(round(b$pval, 3), 0.208)
+ expect_equal(b$df, 1)
+ expect_equal(b$var_c, "mpg")
+ expect_equal(b$g_var, "vs")
+ }
})
test_that("bartlett test throws error messages", {
diff --git a/tests/testthat/test-norm-output.R b/tests/testthat/test-norm-output.R
index feee35c..e08b90c 100644
--- a/tests/testthat/test-norm-output.R
+++ b/tests/testthat/test-norm-output.R
@@ -8,8 +8,3 @@ test_that("output from ols_corr_test is as expected", {
expect_equal(round(ols_test_correlation(model), 3), 0.97)
})
-test_that("ols_test_normality returns error messages", {
- model <- glm(prog ~ female + read + science, data = hsb, family = binomial(link = 'logit'))
- expect_error(ols_test_normality(hsb$female), "y must be numeric")
- expect_error(ols_test_normality(model), "Please specify a OLS linear regression model.")
-})
\ No newline at end of file
diff --git a/tests/testthat/test-visual.R b/tests/testthat/test-visual.R
index 68fd39e..692bd2c 100644
--- a/tests/testthat/test-visual.R
+++ b/tests/testthat/test-visual.R
@@ -70,6 +70,9 @@ test_that("cooks d bar plot is as expected", {
p <- ols_plot_cooksd_bar(model, print_plot = FALSE)
vdiffr::expect_doppelganger("cooks d bar plot", p$plot)
+ p1 <- ols_plot_cooksd_bar(model, threshold = 0.2, print_plot = FALSE)
+ vdiffr::expect_doppelganger("cooks d bar plot threshold", p1$plot)
+
p2 <- ols_plot_cooksd_bar(model, type = 2, print_plot = FALSE)
vdiffr::expect_doppelganger("cooks d bar plot type 2", p2$plot)
@@ -87,6 +90,9 @@ test_that("cooks d bar chart is as expected", {
skip_on_cran()
p <- ols_plot_cooksd_chart(model, print_plot = FALSE)
vdiffr::expect_doppelganger("cooks d bar chart", p$plot)
+
+ p1 <- ols_plot_cooksd_chart(model, threshold = 0.2, print_plot = FALSE)
+ vdiffr::expect_doppelganger("cooks d bar chart threshold", p1$plot)
})
test_that("dffits plot is as expected", {
@@ -415,4 +421,4 @@ test_that("sbic both direction regression plot is as expected", {
p2 <- plot(ols_step_both_sbic(model), details = FALSE, print_plot = FALSE)
vdiffr::expect_doppelganger("sbc both direction regression plot", p2$plot)
-})
\ No newline at end of file
+})
diff --git a/vignettes/variable_selection.Rmd b/vignettes/variable_selection.Rmd
index 8b9c5eb..6147a70 100644
--- a/vignettes/variable_selection.Rmd
+++ b/vignettes/variable_selection.Rmd
@@ -23,7 +23,15 @@ library(goftest)
## Introduction
-## All Possible Regression
+Variable selection refers to the process of choosing the most relevant variables to include in a
+regression model. They help to improve model performance and avoid over fitting.
+
+Before we explore stepwise selection methods, let us take a quick look at all/best subset regression.
+As they evaluate every possible variable combination, these methods are computationally intensive and may
+crash your system if used with a large set of variables. We have included them in the package purely for
+educational purpose.
+
+### All Possible Regression
All subset regression tests all possible subsets of the set of potential independent variables. If there are K potential independent variables (besides the constant), then there are $2^{k}$ distinct subsets of them to be tested. For example, if you have 10 candidate independent variables, the number of subsets to be tested is $2^{10}$, which is 1024, and if you have 20 candidate variables, the number is $2^{20}$, which is more than one million.
@@ -32,15 +40,7 @@ model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_step_all_possible(model)
```
-The `plot` method shows the panel of fit criteria for all possible regression methods.
-
-```{r allsubplot, fig.width=10, fig.height=10, fig.align='center'}
-model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
-k <- ols_step_all_possible(model)
-plot(k)
-```
-
-## Best Subset Regression
+### Best Subset Regression
Select the subset of predictors that do the best at meeting some well-defined objective criterion,
such as having the largest R2 value or the smallest MSE, Mallow's Cp or AIC.
@@ -50,188 +50,134 @@ model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
ols_step_best_subset(model)
```
-The `plot` method shows the panel of fit criteria for best subset regression methods.
-
-```{r bestsubplot, fig.width=10, fig.height=10, fig.align='center'}
-model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
-k <- ols_step_best_subset(model)
-plot(k)
-```
+## Stepwise Selection
-## Stepwise Forward Regression
+Stepwise regression is a method of fitting regression models that involves the
+iterative selection of independent variables to use in a model. It can be
+achieved through forward selection, backward elimination, or a combination of
+both methods. The forward selection approach starts with no variables and adds
+each new variable incrementally, testing for statistical significance, while
+the backward elimination method begins with a full model and then removes the
+least statistically significant variables one at a time.
-Build regression model from a set of candidate predictor variables by entering predictors based on
-p values, in a stepwise manner until there is no variable left to enter any more. The model should include all the candidate predictor variables. If details is set to `TRUE`, each step is displayed.
+### Model
-### Variable Selection
+We will use the below model throughout this article except in the case of hierarchical selection.
+You can learn more about the data [here](https://olsrr.rsquaredacademy.com/reference/surgical).
-```{r stepf1}
-# stepwise forward regression
+```{r model}
model <- lm(y ~ ., data = surgical)
-ols_step_forward_p(model)
+summary(model)
```
-### Plot
+### Model specification
-```{r stepf2, fig.width=10, fig.height=10, fig.align='center'}
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_forward_p(model)
-plot(k)
-```
+Irrespective of the stepwise method being used, we have to specify the full model i.e. all the variabels/predictors
+under consideration as `olsrr` extracts the candidate variables for selection/elimination from the model specified.
-### Detailed Output
+##### Forward selection
-```{r stepwisefdetails}
+```{r stepf1}
# stepwise forward regression
-model <- lm(y ~ ., data = surgical)
-ols_step_forward_p(model, details = TRUE)
+ols_step_forward_p(model)
```
-## Stepwise Backward Regression
+##### Backward elimination
-Build regression model from a set of candidate predictor variables by removing predictors based on
-p values, in a stepwise manner until there is no variable left to remove any more. The model should include all the candidate predictor variables. If details is set to `TRUE`, each step is displayed.
-
-### Variable Selection
-
-```{r stepb, fig.width=10, fig.height=10, fig.align='center'}
+```{r stepb}
# stepwise backward regression
-model <- lm(y ~ ., data = surgical)
ols_step_backward_p(model)
```
-### Plot
-
-```{r stepb2, fig.width=10, fig.height=10, fig.align='center'}
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_backward_p(model)
-plot(k)
-```
-
-### Detailed Output
-
-```{r stepwisebdetails}
-# stepwise backward regression
-model <- lm(y ~ ., data = surgical)
-ols_step_backward_p(model, details = TRUE)
-```
-
-## Stepwise Regression
+### Criteria
-Build regression model from a set of candidate predictor variables by entering and removing predictors based on
-p values, in a stepwise manner until there is no variable left to enter or remove any more. The model should include all the candidate predictor variables. If details is set to `TRUE`, each step is displayed.
+The criteria for selecting variables may be one of the following:
-### Variable Selection
+- p value
+- akaike information criterion (aic)
+- schwarz bayesian criterion (sbc)
+- sawa bayesian criterion (sbic)
+- r-square
+- adjusted r-square
+
+### Include/exclude variables
-```{r stepwise1}
-# stepwise regression
-model <- lm(y ~ ., data = surgical)
-ols_step_both_p(model)
-```
+We can force variables to be included or excluded from the model at all stages of variable selection. The
+variables may be specified either by name or position in the model specified.
-### Plot
+##### By name
-```{r stepwise2, fig.width=10, fig.height=10, fig.align='center'}
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_both_p(model)
-plot(k)
+```{r include_name}
+ols_step_forward_p(model, include = c("age", "alc_mod"))
```
-### Detailed Output
+##### By index
-```{r stepwisedetails}
-# stepwise regression
-model <- lm(y ~ ., data = surgical)
-ols_step_both_p(model, details = TRUE)
+```{r include_index}
+ols_step_forward_p(model, include = c(5, 7))
```
-## Stepwise AIC Forward Regression
-
-Build regression model from a set of candidate predictor variables by entering predictors based on
-Akaike Information Criteria, in a stepwise manner until there is no variable left to enter any more.
-The model should include all the candidate predictor variables. If details is set to `TRUE`, each step is displayed.
+### Standardized output
-### Variable Selection
+All stepwise selection methods display standard output which includes:
-```{r stepaicf1}
-# stepwise aic forward regression
-model <- lm(y ~ ., data = surgical)
-ols_step_forward_aic(model)
-```
+- selection summary
+- model summary
+- ANOVA
+- parameter estimates
-### Plot
-
-```{r stepaicf2, fig.width=5, fig.height=5, fig.align='center'}
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_forward_aic(model)
-plot(k)
+```{r output}
+# adjusted r-square
+ols_step_forward_adj_r2(model)
```
-### Detailed Output
-
-```{r stepwiseaicfdetails}
-# stepwise aic forward regression
-model <- lm(y ~ ., data = surgical)
-ols_step_forward_aic(model, details = TRUE)
-```
+### Visualization
-## Stepwise AIC Backward Regression
+Use the `plot()` method to visualize variable selection. It will display how the variable selection criteria
+changes at each step of the selection process along with the variable selected.
-Build regression model from a set of candidate predictor variables by removing predictors based on
-Akaike Information Criteria, in a stepwise manner until there is no variable left to remove any more.
-The model should include all the candidate predictor variables. If details is set to `TRUE`, each step is displayed.
-
-### Variable Selection
-
-```{r stepaicb1}
-# stepwise aic backward regression
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_backward_aic(model)
-k
+```{r visualize}
+# adjusted r-square
+k <- ols_step_forward_adj_r2(model)
+plot(k)
```
-### Plot
+### Verbose output
-```{r stepaicb2, fig.width=5, fig.height=5, fig.align='center'}
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_backward_aic(model)
-plot(k)
-```
+To view the detailed regression output at each stage of variable selection/elimination, set `details` to `TRUE`. It will
+display the following information at each step:
-### Detailed Output
+- step number
+- variable selected/eliminated
+- model
+- value of the criteria at that stage
-```{r stepwiseaicbdetails}
-# stepwise aic backward regression
-model <- lm(y ~ ., data = surgical)
-ols_step_backward_aic(model, details = TRUE)
+```{r details}
+# adjusted r-square
+ols_step_forward_adj_r2(model, details = TRUE)
```
-## Stepwise AIC Regression
+### Progress
-Build regression model from a set of candidate predictor variables by entering and removing predictors based on
-Akaike Information Criteria, in a stepwise manner until there is no variable left to enter or remove any more.
-The model should include all the candidate predictor variables. If details is set to `TRUE`, each step is displayed.
+To view the progress in the variable selection procedure, set `progress` to `TRUE`. It will display the variable
+being selected/eliminated at each step until there are no more candidate variables left.
-### Variable Selection
-
-```{r stepwiseaic1}
-# stepwise aic regression
-model <- lm(y ~ ., data = surgical)
-ols_step_both_aic(model)
+```{r progress}
+# adjusted r-square
+ols_step_forward_adj_r2(model, progress = TRUE)
```
-### Plot
+### Hierarchical selection
-```{r stepwiseaic2, fig.width=5, fig.height=5, fig.align='center'}
-model <- lm(y ~ ., data = surgical)
-k <- ols_step_both_aic(model)
-plot(k)
-```
-
-### Detailed Output
+When using `p` values as the criterion for selecting/eliminating variables, we can enable hierarchical
+selection. In this method, the search for the most significant variable is restricted to the next available
+variable. In the below example, as `liver_test` does not meet the threshold for selection, none of the
+variables after `liver_test` are considered for further selection i.e. the stepwise selection ends as soon
+as it comes across a variable that does not meet the selection threshold. You can learn more about hierachichal
+selection [here](https://www.stata.com/manuals/rstepwise.pdf).
-```{r stepwiseaicdetails}
-# stepwise aic regression
-model <- lm(y ~ ., data = surgical)
-ols_step_both_aic(model, details = TRUE)
+```{r hierarchical}
+# hierarchical selection
+m <- lm(y ~ bcs + alc_heavy + pindex + enzyme_test + liver_test + age + gender + alc_mod, data = surgical)
+ols_step_forward_p(m, 0.1, hierarchical = TRUE)
```