diff --git a/R/ds-auto-summary.R b/R/ds-auto-summary.R index 97e2abd..eac67ae 100644 --- a/R/ds-auto-summary.R +++ b/R/ds-auto-summary.R @@ -1,6 +1,6 @@ -#' Summary statistics +#' Descriptive statistics and frquency tables #' -#' Generate summary statistics for all continuous variables in data. +#' Generate summary statistics & frequency table for all continuous variables in data. #' #' @param data A \code{data.frame} or \code{tibble}. #' @param ... Column(s) in \code{data}. diff --git a/R/ds-multistats.R b/R/ds-multistats.R index b02238a..2a7b41a 100644 --- a/R/ds-multistats.R +++ b/R/ds-multistats.R @@ -1,4 +1,4 @@ -#' Multiple variable statistics +#' Tidy descriptive statistics #' #' Descriptive statistics for multiple variables. #' diff --git a/R/ds-plots.R b/R/ds-plots.R index 6056e42..9cfb8f9 100644 --- a/R/ds-plots.R +++ b/R/ds-plots.R @@ -7,7 +7,7 @@ #' #' @examples #' ds_plot_scatter(mtcarz) -#' ds_plot_scatter(mtcarz, mpg, disp, hp) +#' ds_plot_scatter(mtcarz, mpg, disp) #' #' @importFrom rlang sym #' @importFrom utils combn @@ -424,7 +424,7 @@ ds_plot_bar_grouped <- function(data, ...) { } -#' Compate distributions +#' Compare distributions #' #' Creates box plots if the data has both categorical & continuous variables. #' diff --git a/_pkgdown.yml b/_pkgdown.yml index 99438ea..e694d69 100644 --- a/_pkgdown.yml +++ b/_pkgdown.yml @@ -65,6 +65,7 @@ reference: - ds_auto_summary_stats - ds_summary_stats - ds_tidy_stats + - ds_freq_table - ds_measures_location - ds_measures_variation - ds_measures_symmetry diff --git a/cran-comments.md b/cran-comments.md index 8b70a39..a74ddef 100644 --- a/cran-comments.md +++ b/cran-comments.md @@ -1,16 +1,16 @@ -This is a patch release for bug fixes. - ## Test environments -* local Windows 10, R 3.4.4 -* ubuntu 12.04 (on travis-ci), R 3.3.3, R 3.4.4, R-devel +* local Windows 10, R 3.5.1 +* ubuntu 14.04 (on travis-ci), R 3.4.4, R 3.5.2, R-devel * win-builder (devel and release) ## R CMD check results -0 errors | 0 warnings | 0 note +0 errors | 0 warnings | 1 note + +* There was 1 NOTE about ORCID ID in R 3.4.4 ## Reverse dependencies -We checked 2 dependencies (olsrr and inferr) and it returned a NOTE. +We checked 4 dependencies (olsrr, blorr, xplorerr and inferr) and it returned a NOTE. diff --git a/docs/CNAME b/docs/CNAME new file mode 100644 index 0000000..3c70af6 --- /dev/null +++ b/docs/CNAME @@ -0,0 +1 @@ +descriptr.rsquaredacademy.com diff --git a/docs/CONDUCT.html b/docs/CONDUCT.html index 33090b4..3c67a79 100644 --- a/docs/CONDUCT.html +++ b/docs/CONDUCT.html @@ -60,7 +60,7 @@
@@ -77,10 +77,13 @@ diff --git a/docs/LICENSE-text.html b/docs/LICENSE-text.html index e8a4872..a557243 100644 --- a/docs/LICENSE-text.html +++ b/docs/LICENSE-text.html @@ -60,7 +60,7 @@ @@ -77,10 +77,13 @@ diff --git a/docs/articles/categorical-data.html b/docs/articles/categorical-data.html new file mode 100644 index 0000000..d85bc8a --- /dev/null +++ b/docs/articles/categorical-data.html @@ -0,0 +1,424 @@ + + + + + + + +vignettes/categorical-data.Rmd
+ categorical-data.Rmd
In this document, we will introduce you to functions for exploring and visualizing categorical data.
+We have modified the mtcars
data to create a new data set mtcarz
. The only difference between the two data sets is related to the variable types.
str(mtcarz)
+#> 'data.frame': 32 obs. of 11 variables:
+#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
+#> $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
+#> $ disp: num 160 160 108 258 360 ...
+#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
+#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
+#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
+#> $ qsec: num 16.5 17 18.6 19.4 17 ...
+#> $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
+#> $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
+#> $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
+#> $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
The ds_cross_table()
function creates two way tables of categorical variables.
ds_cross_table(mtcarz, cyl, gear)
+#> Cell Contents
+#> |---------------|
+#> | Frequency |
+#> | Percent |
+#> | Row Pct |
+#> | Col Pct |
+#> |---------------|
+#>
+#> Total Observations: 32
+#>
+#> ----------------------------------------------------------------------------
+#> | | gear |
+#> ----------------------------------------------------------------------------
+#> | cyl | 3 | 4 | 5 | Row Total |
+#> ----------------------------------------------------------------------------
+#> | 4 | 1 | 8 | 2 | 11 |
+#> | | 0.031 | 0.25 | 0.062 | |
+#> | | 0.09 | 0.73 | 0.18 | 0.34 |
+#> | | 0.07 | 0.67 | 0.4 | |
+#> ----------------------------------------------------------------------------
+#> | 6 | 2 | 4 | 1 | 7 |
+#> | | 0.062 | 0.125 | 0.031 | |
+#> | | 0.29 | 0.57 | 0.14 | 0.22 |
+#> | | 0.13 | 0.33 | 0.2 | |
+#> ----------------------------------------------------------------------------
+#> | 8 | 12 | 0 | 2 | 14 |
+#> | | 0.375 | 0 | 0.062 | |
+#> | | 0.86 | 0 | 0.14 | 0.44 |
+#> | | 0.8 | 0 | 0.4 | |
+#> ----------------------------------------------------------------------------
+#> | Column Total | 15 | 12 | 5 | 32 |
+#> | | 0.468 | 0.375 | 0.155 | |
+#> ----------------------------------------------------------------------------
If you want the above result as a tibble, use ds_twoway_table()
.
ds_twoway_table(mtcarz, cyl, gear)
+#> Joining, by = c("cyl", "gear", "count")
+#> # A tibble: 8 x 6
+#> cyl gear count percent row_percent col_percent
+#> <fct> <fct> <int> <dbl> <dbl> <dbl>
+#> 1 4 3 1 0.0312 0.0909 0.0667
+#> 2 4 4 8 0.25 0.727 0.667
+#> 3 4 5 2 0.0625 0.182 0.4
+#> 4 6 3 2 0.0625 0.286 0.133
+#> 5 6 4 4 0.125 0.571 0.333
+#> 6 6 5 1 0.0312 0.143 0.2
+#> 7 8 3 12 0.375 0.857 0.8
+#> 8 8 5 2 0.0625 0.143 0.4
A plot()
method has been defined which will generate:
The ds_freq_table()
function creates frequency tables.
ds_freq_table(mtcarz, cyl)
+#> Variable: cyl
+#> -----------------------------------------------------------------------
+#> Levels Frequency Cum Frequency Percent Cum Percent
+#> -----------------------------------------------------------------------
+#> 4 11 11 34.38 34.38
+#> -----------------------------------------------------------------------
+#> 6 7 18 21.88 56.25
+#> -----------------------------------------------------------------------
+#> 8 14 32 43.75 100
+#> -----------------------------------------------------------------------
+#> Total 32 - 100.00 -
+#> -----------------------------------------------------------------------
A plot()
method has been defined which will create a bar plot.
The ds_auto_freq_table()
function creates multiple one way tables by creating a frequency table for each categorical variable in a data set. You can also specify a subset of variables if you do not want all the variables in the data set to be used.
ds_auto_freq_table(mtcarz)
+#> Variable: cyl
+#> -----------------------------------------------------------------------
+#> Levels Frequency Cum Frequency Percent Cum Percent
+#> -----------------------------------------------------------------------
+#> 4 11 11 34.38 34.38
+#> -----------------------------------------------------------------------
+#> 6 7 18 21.88 56.25
+#> -----------------------------------------------------------------------
+#> 8 14 32 43.75 100
+#> -----------------------------------------------------------------------
+#> Total 32 - 100.00 -
+#> -----------------------------------------------------------------------
+#>
+#> Variable: vs
+#> -----------------------------------------------------------------------
+#> Levels Frequency Cum Frequency Percent Cum Percent
+#> -----------------------------------------------------------------------
+#> 0 18 18 56.25 56.25
+#> -----------------------------------------------------------------------
+#> 1 14 32 43.75 100
+#> -----------------------------------------------------------------------
+#> Total 32 - 100.00 -
+#> -----------------------------------------------------------------------
+#>
+#> Variable: am
+#> -----------------------------------------------------------------------
+#> Levels Frequency Cum Frequency Percent Cum Percent
+#> -----------------------------------------------------------------------
+#> 0 19 19 59.38 59.38
+#> -----------------------------------------------------------------------
+#> 1 13 32 40.62 100
+#> -----------------------------------------------------------------------
+#> Total 32 - 100.00 -
+#> -----------------------------------------------------------------------
+#>
+#> Variable: gear
+#> -----------------------------------------------------------------------
+#> Levels Frequency Cum Frequency Percent Cum Percent
+#> -----------------------------------------------------------------------
+#> 3 15 15 46.88 46.88
+#> -----------------------------------------------------------------------
+#> 4 12 27 37.5 84.38
+#> -----------------------------------------------------------------------
+#> 5 5 32 15.62 100
+#> -----------------------------------------------------------------------
+#> Total 32 - 100.00 -
+#> -----------------------------------------------------------------------
+#>
+#> Variable: carb
+#> -----------------------------------------------------------------------
+#> Levels Frequency Cum Frequency Percent Cum Percent
+#> -----------------------------------------------------------------------
+#> 1 7 7 21.88 21.88
+#> -----------------------------------------------------------------------
+#> 2 10 17 31.25 53.12
+#> -----------------------------------------------------------------------
+#> 3 3 20 9.38 62.5
+#> -----------------------------------------------------------------------
+#> 4 10 30 31.25 93.75
+#> -----------------------------------------------------------------------
+#> 6 1 31 3.12 96.88
+#> -----------------------------------------------------------------------
+#> 8 1 32 3.12 100
+#> -----------------------------------------------------------------------
+#> Total 32 - 100.00 -
+#> -----------------------------------------------------------------------
The ds_auto_cross_table()
function creates multiple two way tables by creating a cross table for each unique pair of categorical variables in a data set. You can also specify a subset of variables if you do not want all the variables in the data set to be used.
ds_auto_cross_table(mtcarz, cyl, gear, am)
+#> Cell Contents
+#> |---------------|
+#> | Frequency |
+#> | Percent |
+#> | Row Pct |
+#> | Col Pct |
+#> |---------------|
+#>
+#> Total Observations: 32
+#>
+#> cyl vs gear
+#> ----------------------------------------------------------------------------
+#> | | gear |
+#> ----------------------------------------------------------------------------
+#> | cyl | 3 | 4 | 5 | Row Total |
+#> ----------------------------------------------------------------------------
+#> | 4 | 1 | 8 | 2 | 11 |
+#> | | 0.031 | 0.25 | 0.062 | |
+#> | | 0.09 | 0.73 | 0.18 | 0.34 |
+#> | | 0.07 | 0.67 | 0.4 | |
+#> ----------------------------------------------------------------------------
+#> | 6 | 2 | 4 | 1 | 7 |
+#> | | 0.062 | 0.125 | 0.031 | |
+#> | | 0.29 | 0.57 | 0.14 | 0.22 |
+#> | | 0.13 | 0.33 | 0.2 | |
+#> ----------------------------------------------------------------------------
+#> | 8 | 12 | 0 | 2 | 14 |
+#> | | 0.375 | 0 | 0.062 | |
+#> | | 0.86 | 0 | 0.14 | 0.44 |
+#> | | 0.8 | 0 | 0.4 | |
+#> ----------------------------------------------------------------------------
+#> | Column Total | 15 | 12 | 5 | 32 |
+#> | | 0.468 | 0.375 | 0.155 | |
+#> ----------------------------------------------------------------------------
+#>
+#>
+#> cyl vs am
+#> -------------------------------------------------------------
+#> | | am |
+#> -------------------------------------------------------------
+#> | cyl | 0 | 1 | Row Total |
+#> -------------------------------------------------------------
+#> | 4 | 3 | 8 | 11 |
+#> | | 0.094 | 0.25 | |
+#> | | 0.27 | 0.73 | 0.34 |
+#> | | 0.16 | 0.62 | |
+#> -------------------------------------------------------------
+#> | 6 | 4 | 3 | 7 |
+#> | | 0.125 | 0.094 | |
+#> | | 0.57 | 0.43 | 0.22 |
+#> | | 0.21 | 0.23 | |
+#> -------------------------------------------------------------
+#> | 8 | 12 | 2 | 14 |
+#> | | 0.375 | 0.062 | |
+#> | | 0.86 | 0.14 | 0.44 |
+#> | | 0.63 | 0.15 | |
+#> -------------------------------------------------------------
+#> | Column Total | 19 | 13 | 32 |
+#> | | 0.594 | 0.406 | |
+#> -------------------------------------------------------------
+#>
+#>
+#> gear vs am
+#> -------------------------------------------------------------
+#> | | am |
+#> -------------------------------------------------------------
+#> | gear | 0 | 1 | Row Total |
+#> -------------------------------------------------------------
+#> | 3 | 15 | 0 | 15 |
+#> | | 0.469 | 0 | |
+#> | | 1 | 0 | 0.47 |
+#> | | 0.79 | 0 | |
+#> -------------------------------------------------------------
+#> | 4 | 4 | 8 | 12 |
+#> | | 0.125 | 0.25 | |
+#> | | 0.33 | 0.67 | 0.38 |
+#> | | 0.21 | 0.62 | |
+#> -------------------------------------------------------------
+#> | 5 | 0 | 5 | 5 |
+#> | | 0 | 0.156 | |
+#> | | 0 | 1 | 0.16 |
+#> | | 0 | 0.38 | |
+#> -------------------------------------------------------------
+#> | Column Total | 19 | 13 | 32 |
+#> | | 0.594 | 0.406 | |
+#> -------------------------------------------------------------
vignettes/continuous-data.Rmd
+ continuous-data.Rmd
This document introduces you to a basic set of functions that describe data continuous data. The other two vignettes introduce you to functions that describe categorical data and visualization options.
+We have modified the mtcars
data to create a new data set mtcarz
. The only difference between the two data sets is related to the variable types.
str(mtcarz)
+#> 'data.frame': 32 obs. of 11 variables:
+#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
+#> $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
+#> $ disp: num 160 160 108 258 360 ...
+#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
+#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
+#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
+#> $ qsec: num 16.5 17 18.6 19.4 17 ...
+#> $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
+#> $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
+#> $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
+#> $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
The ds_screener()
function will screen a data set and return the following: - Column/Variable Names - Data Type - Levels (in case of categorical data) - Number of missing observations - % of missing observations
ds_screener(mtcarz)
+#> -----------------------------------------------------------------------
+#> | Column Name | Data Type | Levels | Missing | Missing (%) |
+#> -----------------------------------------------------------------------
+#> | mpg | numeric | NA | 0 | 0 |
+#> | cyl | factor | 4 6 8 | 0 | 0 |
+#> | disp | numeric | NA | 0 | 0 |
+#> | hp | numeric | NA | 0 | 0 |
+#> | drat | numeric | NA | 0 | 0 |
+#> | wt | numeric | NA | 0 | 0 |
+#> | qsec | numeric | NA | 0 | 0 |
+#> | vs | factor | 0 1 | 0 | 0 |
+#> | am | factor | 0 1 | 0 | 0 |
+#> | gear | factor | 3 4 5 | 0 | 0 |
+#> | carb | factor |1 2 3 4 6 8| 0 | 0 |
+#> -----------------------------------------------------------------------
+#>
+#> Overall Missing Values 0
+#> Percentage of Missing Values 0 %
+#> Rows with Missing Values 0
+#> Columns With Missing Values 0
The ds_summary_stats
function returns a comprehensive set of statistics including measures of location, variation, symmetry and extreme observations.
ds_summary_stats(mtcarz, mpg)
+#> ------------------------------ Variable: mpg ------------------------------
+#>
+#> Univariate Analysis
+#>
+#> N 32.00 Variance 36.32
+#> Missing 0.00 Std Deviation 6.03
+#> Mean 20.09 Range 23.50
+#> Median 19.20 Interquartile Range 7.38
+#> Mode 10.40 Uncorrected SS 14042.31
+#> Trimmed Mean 19.95 Corrected SS 1126.05
+#> Skewness 0.67 Coeff Variation 30.00
+#> Kurtosis -0.02 Std Error Mean 1.07
+#>
+#> Quantiles
+#>
+#> Quantile Value
+#>
+#> Max 33.90
+#> 99% 33.44
+#> 95% 31.30
+#> 90% 30.09
+#> Q3 22.80
+#> Median 19.20
+#> Q1 15.43
+#> 10% 14.34
+#> 5% 12.00
+#> 1% 10.40
+#> Min 10.40
+#>
+#> Extreme Values
+#>
+#> Low High
+#>
+#> Obs Value Obs Value
+#> 15 10.4 20 33.9
+#> 16 10.4 18 32.4
+#> 24 13.3 19 30.4
+#> 7 14.3 28 30.4
+#> 17 14.7 26 27.3
You can pass multiple variables as shown below:
+ds_summary_stats(mtcarz, mpg, disp)
+#> ------------------------------ Variable: mpg ------------------------------
+#>
+#> Univariate Analysis
+#>
+#> N 32.00 Variance 36.32
+#> Missing 0.00 Std Deviation 6.03
+#> Mean 20.09 Range 23.50
+#> Median 19.20 Interquartile Range 7.38
+#> Mode 10.40 Uncorrected SS 14042.31
+#> Trimmed Mean 19.95 Corrected SS 1126.05
+#> Skewness 0.67 Coeff Variation 30.00
+#> Kurtosis -0.02 Std Error Mean 1.07
+#>
+#> Quantiles
+#>
+#> Quantile Value
+#>
+#> Max 33.90
+#> 99% 33.44
+#> 95% 31.30
+#> 90% 30.09
+#> Q3 22.80
+#> Median 19.20
+#> Q1 15.43
+#> 10% 14.34
+#> 5% 12.00
+#> 1% 10.40
+#> Min 10.40
+#>
+#> Extreme Values
+#>
+#> Low High
+#>
+#> Obs Value Obs Value
+#> 15 10.4 20 33.9
+#> 16 10.4 18 32.4
+#> 24 13.3 19 30.4
+#> 7 14.3 28 30.4
+#> 17 14.7 26 27.3
+#>
+#>
+#>
+#> ------------------------------ Variable: disp -----------------------------
+#>
+#> Univariate Analysis
+#>
+#> N 32.00 Variance 15360.80
+#> Missing 0.00 Std Deviation 123.94
+#> Mean 230.72 Range 400.90
+#> Median 196.30 Interquartile Range 205.18
+#> Mode 275.80 Uncorrected SS 2179627.47
+#> Trimmed Mean 228.00 Corrected SS 476184.79
+#> Skewness 0.42 Coeff Variation 53.72
+#> Kurtosis -1.07 Std Error Mean 21.91
+#>
+#> Quantiles
+#>
+#> Quantile Value
+#>
+#> Max 472.00
+#> 99% 468.28
+#> 95% 449.00
+#> 90% 396.00
+#> Q3 326.00
+#> Median 196.30
+#> Q1 120.83
+#> 10% 80.61
+#> 5% 77.35
+#> 1% 72.53
+#> Min 71.10
+#>
+#> Extreme Values
+#>
+#> Low High
+#>
+#> Obs Value Obs Value
+#> 20 71.1 15 472
+#> 19 75.7 16 460
+#> 18 78.7 17 440
+#> 26 79 25 400
+#> 28 95.1 5 360
If you do not specify any variables, it will detect all the continuous variables in the data set and return summary statistics for each of them.
+The ds_freq_table
function creates frequency tables for continuous variables. The default number of intervals is 5.
ds_freq_table(mtcarz, mpg, 4)
+#> Variable: mpg
+#> |---------------------------------------------------------------------------|
+#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent |
+#> |---------------------------------------------------------------------------|
+#> | 10.4 - 16.3 | 10 | 10 | 31.25 | 31.25 |
+#> |---------------------------------------------------------------------------|
+#> | 16.3 - 22.1 | 13 | 23 | 40.62 | 71.88 |
+#> |---------------------------------------------------------------------------|
+#> | 22.1 - 28 | 5 | 28 | 15.62 | 87.5 |
+#> |---------------------------------------------------------------------------|
+#> | 28 - 33.9 | 4 | 32 | 12.5 | 100 |
+#> |---------------------------------------------------------------------------|
+#> | Total | 32 | - | 100.00 | - |
+#> |---------------------------------------------------------------------------|
A plot()
method has been defined which will generate a histogram.
If you want to view summary statistics and frequency tables of all or subset of variables in a data set, use ds_auto_summary()
.
ds_auto_summary_stats(mtcarz, disp, mpg)
+#> ------------------------------ Variable: disp -----------------------------
+#>
+#> ---------------------------- Summary Statistics ---------------------------
+#>
+#> ------------------------------ Variable: disp -----------------------------
+#>
+#> Univariate Analysis
+#>
+#> N 32.00 Variance 15360.80
+#> Missing 0.00 Std Deviation 123.94
+#> Mean 230.72 Range 400.90
+#> Median 196.30 Interquartile Range 205.18
+#> Mode 275.80 Uncorrected SS 2179627.47
+#> Trimmed Mean 228.00 Corrected SS 476184.79
+#> Skewness 0.42 Coeff Variation 53.72
+#> Kurtosis -1.07 Std Error Mean 21.91
+#>
+#> Quantiles
+#>
+#> Quantile Value
+#>
+#> Max 472.00
+#> 99% 468.28
+#> 95% 449.00
+#> 90% 396.00
+#> Q3 326.00
+#> Median 196.30
+#> Q1 120.83
+#> 10% 80.61
+#> 5% 77.35
+#> 1% 72.53
+#> Min 71.10
+#>
+#> Extreme Values
+#>
+#> Low High
+#>
+#> Obs Value Obs Value
+#> 20 71.1 15 472
+#> 19 75.7 16 460
+#> 18 78.7 17 440
+#> 26 79 25 400
+#> 28 95.1 5 360
+#>
+#>
+#>
+#> NULL
+#>
+#>
+#> -------------------------- Frequency Distribution -------------------------
+#>
+#> Variable: disp
+#> |---------------------------------------------------------------------------|
+#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent |
+#> |---------------------------------------------------------------------------|
+#> | 71.1 - 151.3 | 12 | 12 | 37.5 | 37.5 |
+#> |---------------------------------------------------------------------------|
+#> | 151.3 - 231.5 | 5 | 17 | 15.62 | 53.12 |
+#> |---------------------------------------------------------------------------|
+#> | 231.5 - 311.6 | 6 | 23 | 18.75 | 71.88 |
+#> |---------------------------------------------------------------------------|
+#> | 311.6 - 391.8 | 5 | 28 | 15.62 | 87.5 |
+#> |---------------------------------------------------------------------------|
+#> | 391.8 - 472 | 4 | 32 | 12.5 | 100 |
+#> |---------------------------------------------------------------------------|
+#> | Total | 32 | - | 100.00 | - |
+#> |---------------------------------------------------------------------------|
+#>
+#>
+#> ------------------------------ Variable: mpg ------------------------------
+#>
+#> ---------------------------- Summary Statistics ---------------------------
+#>
+#> ------------------------------ Variable: mpg ------------------------------
+#>
+#> Univariate Analysis
+#>
+#> N 32.00 Variance 36.32
+#> Missing 0.00 Std Deviation 6.03
+#> Mean 20.09 Range 23.50
+#> Median 19.20 Interquartile Range 7.38
+#> Mode 10.40 Uncorrected SS 14042.31
+#> Trimmed Mean 19.95 Corrected SS 1126.05
+#> Skewness 0.67 Coeff Variation 30.00
+#> Kurtosis -0.02 Std Error Mean 1.07
+#>
+#> Quantiles
+#>
+#> Quantile Value
+#>
+#> Max 33.90
+#> 99% 33.44
+#> 95% 31.30
+#> 90% 30.09
+#> Q3 22.80
+#> Median 19.20
+#> Q1 15.43
+#> 10% 14.34
+#> 5% 12.00
+#> 1% 10.40
+#> Min 10.40
+#>
+#> Extreme Values
+#>
+#> Low High
+#>
+#> Obs Value Obs Value
+#> 15 10.4 20 33.9
+#> 16 10.4 18 32.4
+#> 24 13.3 19 30.4
+#> 7 14.3 28 30.4
+#> 17 14.7 26 27.3
+#>
+#>
+#>
+#> NULL
+#>
+#>
+#> -------------------------- Frequency Distribution -------------------------
+#>
+#> Variable: mpg
+#> |-----------------------------------------------------------------------|
+#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent |
+#> |-----------------------------------------------------------------------|
+#> | 10.4 - 15.1 | 6 | 6 | 18.75 | 18.75 |
+#> |-----------------------------------------------------------------------|
+#> | 15.1 - 19.8 | 12 | 18 | 37.5 | 56.25 |
+#> |-----------------------------------------------------------------------|
+#> | 19.8 - 24.5 | 8 | 26 | 25 | 81.25 |
+#> |-----------------------------------------------------------------------|
+#> | 24.5 - 29.2 | 2 | 28 | 6.25 | 87.5 |
+#> |-----------------------------------------------------------------------|
+#> | 29.2 - 33.9 | 4 | 32 | 12.5 | 100 |
+#> |-----------------------------------------------------------------------|
+#> | Total | 32 | - | 100.00 | - |
+#> |-----------------------------------------------------------------------|
The ds_group_summary()
function returns descriptive statistics of a continuous variable for the different levels of a categorical variable.
k <- ds_group_summary(mtcarz, cyl, mpg)
+k
+#> mpg by cyl
+#> -----------------------------------------------------------------------------------------
+#> | Statistic/Levels| 4| 6| 8|
+#> -----------------------------------------------------------------------------------------
+#> | Obs| 11| 7| 14|
+#> | Minimum| 21.4| 17.8| 10.4|
+#> | Maximum| 33.9| 21.4| 19.2|
+#> | Mean| 26.66| 19.74| 15.1|
+#> | Median| 26| 19.7| 15.2|
+#> | Mode| 22.8| 21| 10.4|
+#> | Std. Deviation| 4.51| 1.45| 2.56|
+#> | Variance| 20.34| 2.11| 6.55|
+#> | Skewness| 0.35| -0.26| -0.46|
+#> | Kurtosis| -1.43| -1.83| 0.33|
+#> | Uncorrected SS| 8023.83| 2741.14| 3277.34|
+#> | Corrected SS| 203.39| 12.68| 85.2|
+#> | Coeff Variation| 16.91| 7.36| 16.95|
+#> | Std. Error Mean| 1.36| 0.55| 0.68|
+#> | Range| 12.5| 3.6| 8.8|
+#> | Interquartile Range| 7.6| 2.35| 1.85|
+#> -----------------------------------------------------------------------------------------
ds_group_summary()
returns a tibble which can be used for further analysis.
k$tidy_stats
+#> # A tibble: 3 x 15
+#> cyl length min max mean median mode sd variance skewness
+#> <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 4 11 21.4 33.9 26.7 26 22.8 4.51 20.3 0.348
+#> 2 6 7 17.8 21.4 19.7 19.7 21 1.45 2.11 -0.259
+#> 3 8 14 10.4 19.2 15.1 15.2 10.4 2.56 6.55 -0.456
+#> # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>,
+#> # std_error <dbl>, range <dbl>, iqr <dbl>
A plot()
method has been defined for comparing distributions.
If you want grouped summary statistics for multiple variables in a data set, use ds_auto_group_summary()
.
ds_auto_group_summary(mtcarz, cyl, gear, mpg)
+#> mpg by cyl
+#> -----------------------------------------------------------------------------------------
+#> | Statistic/Levels| 4| 6| 8|
+#> -----------------------------------------------------------------------------------------
+#> | Obs| 11| 7| 14|
+#> | Minimum| 21.4| 17.8| 10.4|
+#> | Maximum| 33.9| 21.4| 19.2|
+#> | Mean| 26.66| 19.74| 15.1|
+#> | Median| 26| 19.7| 15.2|
+#> | Mode| 22.8| 21| 10.4|
+#> | Std. Deviation| 4.51| 1.45| 2.56|
+#> | Variance| 20.34| 2.11| 6.55|
+#> | Skewness| 0.35| -0.26| -0.46|
+#> | Kurtosis| -1.43| -1.83| 0.33|
+#> | Uncorrected SS| 8023.83| 2741.14| 3277.34|
+#> | Corrected SS| 203.39| 12.68| 85.2|
+#> | Coeff Variation| 16.91| 7.36| 16.95|
+#> | Std. Error Mean| 1.36| 0.55| 0.68|
+#> | Range| 12.5| 3.6| 8.8|
+#> | Interquartile Range| 7.6| 2.35| 1.85|
+#> -----------------------------------------------------------------------------------------
+#>
+#>
+#>
+#> mpg by gear
+#> -----------------------------------------------------------------------------------------
+#> | Statistic/Levels| 3| 4| 5|
+#> -----------------------------------------------------------------------------------------
+#> | Obs| 15| 12| 5|
+#> | Minimum| 10.4| 17.8| 15|
+#> | Maximum| 21.5| 33.9| 30.4|
+#> | Mean| 16.11| 24.53| 21.38|
+#> | Median| 15.5| 22.8| 19.7|
+#> | Mode| 10.4| 21| 15|
+#> | Std. Deviation| 3.37| 5.28| 6.66|
+#> | Variance| 11.37| 27.84| 44.34|
+#> | Skewness| -0.09| 0.7| 0.56|
+#> | Kurtosis| -0.38| -0.77| -1.83|
+#> | Uncorrected SS| 4050.52| 7528.9| 2462.89|
+#> | Corrected SS| 159.15| 306.29| 177.37|
+#> | Coeff Variation| 20.93| 21.51| 31.15|
+#> | Std. Error Mean| 0.87| 1.52| 2.98|
+#> | Range| 11.1| 16.1| 15.4|
+#> | Interquartile Range| 3.9| 7.08| 10.2|
+#> -----------------------------------------------------------------------------------------
The ds_tidy_stats()
function returns summary/descriptive statistics for variables in a data frame/tibble.
ds_tidy_stats(mtcarz, mpg, disp, hp)
+#> # A tibble: 3 x 16
+#> vars min max mean t_mean median mode range variance stdev skew
+#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 disp 71.1 472 231. 228 196. 276. 401. 15361. 124. 0.420
+#> 2 hp 52 335 147. 144. 123 110 283 4701. 68.6 0.799
+#> 3 mpg 10.4 33.9 20.1 20.0 19.2 10.4 23.5 36.3 6.03 0.672
+#> # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>, q1 <dbl>,
+#> # q3 <dbl>, iqrange <dbl>
If you want to view the measure of location, variation, symmetry, percentiles and extreme observations as tibbles, use the below functions. All of them, except for ds_extreme_obs()
will work with single or multiple variables. If you do not specify the variables, they will return the results for all the continuous variables in the data set.
ds_measures_variation(mtcarz)
+#> # A tibble: 6 x 7
+#> var range iqr variance sd coeff_var std_error
+#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 disp 401. 205. 15361. 124. 53.7 21.9
+#> 2 drat 2.17 0.840 0.286 0.535 14.9 0.0945
+#> 3 hp 283 83.5 4701. 68.6 46.7 12.1
+#> 4 mpg 23.5 7.38 36.3 6.03 30.0 1.07
+#> 5 qsec 8.40 2.01 3.19 1.79 10.0 0.316
+#> 6 wt 3.91 1.03 0.957 0.978 30.4 0.173
ds_percentiles(mtcarz)
+#> # A tibble: 6 x 12
+#> var min per1 per5 per10 q1 median q3 per95 per90 per99
+#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
+#> 1 disp 71.1 72.5 77.4 80.6 121. 196. 326 449 396. 468.
+#> 2 drat 2.76 2.76 2.85 3.01 3.08 3.70 3.92 4.31 4.21 4.78
+#> 3 hp 52 55.1 63.6 66 96.5 123 180 254. 244. 313.
+#> 4 mpg 10.4 10.4 12.0 14.3 15.4 19.2 22.8 31.3 30.1 33.4
+#> 5 qsec 14.5 14.5 15.0 15.5 16.9 17.7 18.9 20.1 20.0 22.1
+#> 6 wt 1.51 1.54 1.74 1.96 2.58 3.32 3.61 5.29 4.05 5.40
+#> # ... with 1 more variable: max <dbl>
vignettes/descriptive-stats.Rmd
- descriptive-stats.Rmd
Descriptive statistics are used to summarize data. It enables us to present the data in a more meaningful way and to discern any patterns existing in the data. They can be useful for two purposes:
-This document introduces you to a basic set of functions that describe data. There is a second vignette which provides details about functions which help visualize statistical distributions.
-We have modified the mtcars
data to create a new data set mtcarz
. The only difference between the two data sets is related to the variable types.
str(mtcarz)
## 'data.frame': 32 obs. of 11 variables:
-## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
-## $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
-## $ disp: num 160 160 108 258 360 ...
-## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
-## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
-## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
-## $ qsec: num 16.5 17 18.6 19.4 17 ...
-## $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
-## $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
-## $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
-## $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
-The ds_screener()
function will screen a data set and return the following: - Column/Variable Names - Data Type - Levels (in case of categorical data) - Number of missing observations - % of missing observations
ds_screener(mtcarz)
## -----------------------------------------------------------------------
-## | Column Name | Data Type | Levels | Missing | Missing (%) |
-## -----------------------------------------------------------------------
-## | mpg | numeric | NA | 0 | 0 |
-## | cyl | factor | 4 6 8 | 0 | 0 |
-## | disp | numeric | NA | 0 | 0 |
-## | hp | numeric | NA | 0 | 0 |
-## | drat | numeric | NA | 0 | 0 |
-## | wt | numeric | NA | 0 | 0 |
-## | qsec | numeric | NA | 0 | 0 |
-## | vs | factor | 0 1 | 0 | 0 |
-## | am | factor | 0 1 | 0 | 0 |
-## | gear | factor | 3 4 5 | 0 | 0 |
-## | carb | factor |1 2 3 4 6 8| 0 | 0 |
-## -----------------------------------------------------------------------
-##
-## Overall Missing Values 0
-## Percentage of Missing Values 0 %
-## Rows with Missing Values 0
-## Columns With Missing Values 0
-The ds_summary_stats
function returns a comprehensive set of statistics for continuous data.
ds_summary_stats(mtcarz, mpg)
## Univariate Analysis
-##
-## N 32.00 Variance 36.32
-## Missing 0.00 Std Deviation 6.03
-## Mean 20.09 Range 23.50
-## Median 19.20 Interquartile Range 7.38
-## Mode 10.40 Uncorrected SS 14042.31
-## Trimmed Mean 19.95 Corrected SS 1126.05
-## Skewness 0.67 Coeff Variation 30.00
-## Kurtosis -0.02 Std Error Mean 1.07
-##
-## Quantiles
-##
-## Quantile Value
-##
-## Max 33.90
-## 99% 33.44
-## 95% 31.30
-## 90% 30.09
-## Q3 22.80
-## Median 19.20
-## Q1 15.43
-## 10% 14.34
-## 5% 12.00
-## 1% 10.40
-## Min 10.40
-##
-## Extreme Values
-##
-## Low High
-##
-## Obs Value Obs Value
-## 15 10.4 20 33.9
-## 16 10.4 18 32.4
-## 24 13.3 19 30.4
-## 7 14.3 28 30.4
-## 17 14.7 26 27.3
-The ds_cross_table()
function creates two way tables of categorical variables.
ds_cross_table(mtcarz, cyl, gear)
## Cell Contents
-## |---------------|
-## | Frequency |
-## | Percent |
-## | Row Pct |
-## | Col Pct |
-## |---------------|
-##
-## Total Observations: 32
-##
-## ----------------------------------------------------------------------------
-## | | gear |
-## ----------------------------------------------------------------------------
-## | cyl | 3 | 4 | 5 | Row Total |
-## ----------------------------------------------------------------------------
-## | 4 | 1 | 8 | 2 | 11 |
-## | | 0.031 | 0.25 | 0.062 | |
-## | | 0.09 | 0.73 | 0.18 | 0.34 |
-## | | 0.07 | 0.67 | 0.4 | |
-## ----------------------------------------------------------------------------
-## | 6 | 2 | 4 | 1 | 7 |
-## | | 0.062 | 0.125 | 0.031 | |
-## | | 0.29 | 0.57 | 0.14 | 0.22 |
-## | | 0.13 | 0.33 | 0.2 | |
-## ----------------------------------------------------------------------------
-## | 8 | 12 | 0 | 2 | 14 |
-## | | 0.375 | 0 | 0.062 | |
-## | | 0.86 | 0 | 0.14 | 0.44 |
-## | | 0.8 | 0 | 0.4 | |
-## ----------------------------------------------------------------------------
-## | Column Total | 15 | 12 | 5 | 32 |
-## | | 0.468 | 0.375 | 0.155 | |
-## ----------------------------------------------------------------------------
-ds_twoway_table()
will return a tibble.
ds_twoway_table(mtcarz, cyl, gear)
## Joining, by = c("cyl", "gear", "count")
-## # A tibble: 8 x 6
-## cyl gear count percent row_percent col_percent
-## <fct> <fct> <int> <dbl> <dbl> <dbl>
-## 1 4 3 1 0.0312 0.0909 0.0667
-## 2 4 4 8 0.25 0.727 0.667
-## 3 4 5 2 0.0625 0.182 0.4
-## 4 6 3 2 0.0625 0.286 0.133
-## 5 6 4 4 0.125 0.571 0.333
-## 6 6 5 1 0.0312 0.143 0.2
-## 7 8 3 12 0.375 0.857 0.8
-## 8 8 5 2 0.0625 0.143 0.4
-A plot method has been defined which will generate:
-k <- ds_cross_table(mtcarz, cyl, gear)
-plot(k)
k <- ds_cross_table(mtcarz, cyl, gear)
-plot(k, stacked = TRUE)
k <- ds_cross_table(mtcarz, cyl, gear)
-plot(k, proportional = TRUE)
The ds_freq_table()
function creates frequency tables for categorical variables.
ds_freq_table(mtcarz, cyl)
## Variable: cyl
-## -----------------------------------------------------------------------
-## Levels Frequency Cum Frequency Percent Cum Percent
-## -----------------------------------------------------------------------
-## 4 11 11 34.38 34.38
-## -----------------------------------------------------------------------
-## 6 7 18 21.88 56.25
-## -----------------------------------------------------------------------
-## 8 14 32 43.75 100
-## -----------------------------------------------------------------------
-## Total 32 - 100.00 -
-## -----------------------------------------------------------------------
-
-The ds_freq_cont
function creates frequency tables for continuous variables. The default number of intervals is 5.
ds_freq_cont(mtcarz, mpg, 4)
## Variable: mpg
-## |---------------------------------------------------------------------------|
-## | Bins | Frequency | Cum Frequency | Percent | Cum Percent |
-## |---------------------------------------------------------------------------|
-## | 10.4 - 16.3 | 10 | 10 | 31.25 | 31.25 |
-## |---------------------------------------------------------------------------|
-## | 16.3 - 22.1 | 13 | 23 | 40.62 | 71.88 |
-## |---------------------------------------------------------------------------|
-## | 22.1 - 28 | 5 | 28 | 15.62 | 87.5 |
-## |---------------------------------------------------------------------------|
-## | 28 - 33.9 | 4 | 32 | 12.5 | 100 |
-## |---------------------------------------------------------------------------|
-## | Total | 32 | - | 100.00 | - |
-## |---------------------------------------------------------------------------|
-
-The ds_group_summary()
function returns descriptive statistics of a continuous variable for the different levels of a categorical variable.
k <- ds_group_summary(mtcarz, cyl, mpg)
-k
## mpg by cyl
-## -----------------------------------------------------------------------------------------
-## | Statistic/Levels| 4| 6| 8|
-## -----------------------------------------------------------------------------------------
-## | Obs| 11| 7| 14|
-## | Minimum| 21.4| 17.8| 10.4|
-## | Maximum| 33.9| 21.4| 19.2|
-## | Mean| 26.66| 19.74| 15.1|
-## | Median| 26| 19.7| 15.2|
-## | Mode| 22.8| 21| 10.4|
-## | Std. Deviation| 4.51| 1.45| 2.56|
-## | Variance| 20.34| 2.11| 6.55|
-## | Skewness| 0.35| -0.26| -0.46|
-## | Kurtosis| -1.43| -1.83| 0.33|
-## | Uncorrected SS| 8023.83| 2741.14| 3277.34|
-## | Corrected SS| 203.39| 12.68| 85.2|
-## | Coeff Variation| 16.91| 7.36| 16.95|
-## | Std. Error Mean| 1.36| 0.55| 0.68|
-## | Range| 12.5| 3.6| 8.8|
-## | Interquartile Range| 7.6| 2.35| 1.85|
-## -----------------------------------------------------------------------------------------
-ds_group_summary()
returns a tibble which can be used for further analysis.
k$tidy_stats
## # A tibble: 3 x 15
-## cyl length min max mean median mode sd variance skewness
-## <fct> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 4 11 21.4 33.9 26.7 26 22.8 4.51 20.3 0.348
-## 2 6 7 17.8 21.4 19.7 19.7 21 1.45 2.11 -0.259
-## 3 8 14 10.4 19.2 15.1 15.2 10.4 2.56 6.55 -0.456
-## # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>,
-## # std_error <dbl>, range <dbl>, iqr <dbl>
-A boxplot()
method has been defined.
k <- ds_group_summary(mtcarz, cyl, mpg)
-plot(k)
The ds_multi_stats()
function generates summary/descriptive statistics for variables in a data frame/tibble.
ds_multi_stats(mtcarz, mpg, disp, hp)
## # A tibble: 3 x 16
-## vars min max mean t_mean median mode range variance stdev skew
-## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
-## 1 disp 71.1 472 231. 228 196. 276. 401. 15361. 124. 0.420
-## 2 hp 52 335 147. 144. 123 110 283 4701. 68.6 0.799
-## 3 mpg 10.4 33.9 20.1 20.0 19.2 10.4 23.5 36.3 6.03 0.672
-## # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>, q1 <dbl>,
-## # q3 <dbl>, iqrange <dbl>
-The ds_oway_tables()
function creates multiple one way tables by creating a frequency table for each categorical variable in a data frame.
ds_oway_tables(mtcarz)
## Variable: cyl
-## -----------------------------------------------------------------------
-## Levels Frequency Cum Frequency Percent Cum Percent
-## -----------------------------------------------------------------------
-## 4 11 11 34.38 34.38
-## -----------------------------------------------------------------------
-## 6 7 18 21.88 56.25
-## -----------------------------------------------------------------------
-## 8 14 32 43.75 100
-## -----------------------------------------------------------------------
-## Total 32 - 100.00 -
-## -----------------------------------------------------------------------
-##
-## Variable: vs
-## -----------------------------------------------------------------------
-## Levels Frequency Cum Frequency Percent Cum Percent
-## -----------------------------------------------------------------------
-## 0 18 18 56.25 56.25
-## -----------------------------------------------------------------------
-## 1 14 32 43.75 100
-## -----------------------------------------------------------------------
-## Total 32 - 100.00 -
-## -----------------------------------------------------------------------
-##
-## Variable: am
-## -----------------------------------------------------------------------
-## Levels Frequency Cum Frequency Percent Cum Percent
-## -----------------------------------------------------------------------
-## 0 19 19 59.38 59.38
-## -----------------------------------------------------------------------
-## 1 13 32 40.62 100
-## -----------------------------------------------------------------------
-## Total 32 - 100.00 -
-## -----------------------------------------------------------------------
-##
-## Variable: gear
-## -----------------------------------------------------------------------
-## Levels Frequency Cum Frequency Percent Cum Percent
-## -----------------------------------------------------------------------
-## 3 15 15 46.88 46.88
-## -----------------------------------------------------------------------
-## 4 12 27 37.5 84.38
-## -----------------------------------------------------------------------
-## 5 5 32 15.62 100
-## -----------------------------------------------------------------------
-## Total 32 - 100.00 -
-## -----------------------------------------------------------------------
-##
-## Variable: carb
-## -----------------------------------------------------------------------
-## Levels Frequency Cum Frequency Percent Cum Percent
-## -----------------------------------------------------------------------
-## 1 7 7 21.88 21.88
-## -----------------------------------------------------------------------
-## 2 10 17 31.25 53.12
-## -----------------------------------------------------------------------
-## 3 3 20 9.38 62.5
-## -----------------------------------------------------------------------
-## 4 10 30 31.25 93.75
-## -----------------------------------------------------------------------
-## 6 1 31 3.12 96.88
-## -----------------------------------------------------------------------
-## 8 1 32 3.12 100
-## -----------------------------------------------------------------------
-## Total 32 - 100.00 -
-## -----------------------------------------------------------------------
-The ds_tway_tables()
function creates multiple two way tables by creating a cross table for each unique pair of categorical variables in a data frame.
ds_tway_tables(mtcarz)
## Cell Contents
-## |---------------|
-## | Frequency |
-## | Percent |
-## | Row Pct |
-## | Col Pct |
-## |---------------|
-##
-## Total Observations: 32
-##
-## cyl vs vs
-## -------------------------------------------------------------
-## | | vs |
-## -------------------------------------------------------------
-## | cyl | 0 | 1 | Row Total |
-## -------------------------------------------------------------
-## | 4 | 1 | 10 | 11 |
-## | | 0.031 | 0.312 | |
-## | | 0.09 | 0.91 | 0.34 |
-## | | 0.06 | 0.71 | |
-## -------------------------------------------------------------
-## | 6 | 3 | 4 | 7 |
-## | | 0.094 | 0.125 | |
-## | | 0.43 | 0.57 | 0.22 |
-## | | 0.17 | 0.29 | |
-## -------------------------------------------------------------
-## | 8 | 14 | 0 | 14 |
-## | | 0.438 | 0 | |
-## | | 1 | 0 | 0.44 |
-## | | 0.78 | 0 | |
-## -------------------------------------------------------------
-## | Column Total | 18 | 14 | 32 |
-## | | 0.563 | 0.437 | |
-## -------------------------------------------------------------
-##
-##
-## cyl vs am
-## -------------------------------------------------------------
-## | | am |
-## -------------------------------------------------------------
-## | cyl | 0 | 1 | Row Total |
-## -------------------------------------------------------------
-## | 4 | 3 | 8 | 11 |
-## | | 0.094 | 0.25 | |
-## | | 0.27 | 0.73 | 0.34 |
-## | | 0.16 | 0.62 | |
-## -------------------------------------------------------------
-## | 6 | 4 | 3 | 7 |
-## | | 0.125 | 0.094 | |
-## | | 0.57 | 0.43 | 0.22 |
-## | | 0.21 | 0.23 | |
-## -------------------------------------------------------------
-## | 8 | 12 | 2 | 14 |
-## | | 0.375 | 0.062 | |
-## | | 0.86 | 0.14 | 0.44 |
-## | | 0.63 | 0.15 | |
-## -------------------------------------------------------------
-## | Column Total | 19 | 13 | 32 |
-## | | 0.594 | 0.406 | |
-## -------------------------------------------------------------
-##
-##
-## cyl vs gear
-## ----------------------------------------------------------------------------
-## | | gear |
-## ----------------------------------------------------------------------------
-## | cyl | 3 | 4 | 5 | Row Total |
-## ----------------------------------------------------------------------------
-## | 4 | 1 | 8 | 2 | 11 |
-## | | 0.031 | 0.25 | 0.062 | |
-## | | 0.09 | 0.73 | 0.18 | 0.34 |
-## | | 0.07 | 0.67 | 0.4 | |
-## ----------------------------------------------------------------------------
-## | 6 | 2 | 4 | 1 | 7 |
-## | | 0.062 | 0.125 | 0.031 | |
-## | | 0.29 | 0.57 | 0.14 | 0.22 |
-## | | 0.13 | 0.33 | 0.2 | |
-## ----------------------------------------------------------------------------
-## | 8 | 12 | 0 | 2 | 14 |
-## | | 0.375 | 0 | 0.062 | |
-## | | 0.86 | 0 | 0.14 | 0.44 |
-## | | 0.8 | 0 | 0.4 | |
-## ----------------------------------------------------------------------------
-## | Column Total | 15 | 12 | 5 | 32 |
-## | | 0.468 | 0.375 | 0.155 | |
-## ----------------------------------------------------------------------------
-##
-##
-## cyl vs carb
-## -------------------------------------------------------------------------------------------------------------------------
-## | | carb |
-## -------------------------------------------------------------------------------------------------------------------------
-## | cyl | 1 | 2 | 3 | 4 | 6 | 8 | Row Total |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 4 | 5 | 6 | 0 | 0 | 0 | 0 | 11 |
-## | | 0.156 | 0.188 | 0 | 0 | 0 | 0 | |
-## | | 0.45 | 0.55 | 0 | 0 | 0 | 0 | 0.34 |
-## | | 0.71 | 0.6 | 0 | 0 | 0 | 0 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 6 | 2 | 0 | 0 | 4 | 1 | 0 | 7 |
-## | | 0.062 | 0 | 0 | 0.125 | 0.031 | 0 | |
-## | | 0.29 | 0 | 0 | 0.57 | 0.14 | 0 | 0.22 |
-## | | 0.29 | 0 | 0 | 0.4 | 1 | 0 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 8 | 0 | 4 | 3 | 6 | 0 | 1 | 14 |
-## | | 0 | 0.125 | 0.094 | 0.188 | 0 | 0.031 | |
-## | | 0 | 0.29 | 0.21 | 0.43 | 0 | 0.07 | 0.44 |
-## | | 0 | 0.4 | 1 | 0.6 | 0 | 1 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 |
-## | | 0.218 | 0.313 | 0.094 | 0.313 | 0.031 | 0.031 | |
-## -------------------------------------------------------------------------------------------------------------------------
-##
-##
-## vs vs am
-## -------------------------------------------------------------
-## | | am |
-## -------------------------------------------------------------
-## | vs | 0 | 1 | Row Total |
-## -------------------------------------------------------------
-## | 0 | 12 | 6 | 18 |
-## | | 0.375 | 0.188 | |
-## | | 0.67 | 0.33 | 0.56 |
-## | | 0.63 | 0.46 | |
-## -------------------------------------------------------------
-## | 1 | 7 | 7 | 14 |
-## | | 0.219 | 0.219 | |
-## | | 0.5 | 0.5 | 0.44 |
-## | | 0.37 | 0.54 | |
-## -------------------------------------------------------------
-## | Column Total | 19 | 13 | 32 |
-## | | 0.594 | 0.407 | |
-## -------------------------------------------------------------
-##
-##
-## vs vs gear
-## ----------------------------------------------------------------------------
-## | | gear |
-## ----------------------------------------------------------------------------
-## | vs | 3 | 4 | 5 | Row Total |
-## ----------------------------------------------------------------------------
-## | 0 | 12 | 2 | 4 | 18 |
-## | | 0.375 | 0.062 | 0.125 | |
-## | | 0.67 | 0.11 | 0.22 | 0.56 |
-## | | 0.8 | 0.17 | 0.8 | |
-## ----------------------------------------------------------------------------
-## | 1 | 3 | 10 | 1 | 14 |
-## | | 0.094 | 0.312 | 0.031 | |
-## | | 0.21 | 0.71 | 0.07 | 0.44 |
-## | | 0.2 | 0.83 | 0.2 | |
-## ----------------------------------------------------------------------------
-## | Column Total | 15 | 12 | 5 | 32 |
-## | | 0.469 | 0.374 | 0.156 | |
-## ----------------------------------------------------------------------------
-##
-##
-## vs vs carb
-## -------------------------------------------------------------------------------------------------------------------------
-## | | carb |
-## -------------------------------------------------------------------------------------------------------------------------
-## | vs | 1 | 2 | 3 | 4 | 6 | 8 | Row Total |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 0 | 0 | 5 | 3 | 8 | 1 | 1 | 18 |
-## | | 0 | 0.156 | 0.094 | 0.25 | 0.031 | 0.031 | |
-## | | 0 | 0.28 | 0.17 | 0.44 | 0.06 | 0.06 | 0.56 |
-## | | 0 | 0.5 | 1 | 0.8 | 1 | 1 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 1 | 7 | 5 | 0 | 2 | 0 | 0 | 14 |
-## | | 0.219 | 0.156 | 0 | 0.062 | 0 | 0 | |
-## | | 0.5 | 0.36 | 0 | 0.14 | 0 | 0 | 0.44 |
-## | | 1 | 0.5 | 0 | 0.2 | 0 | 0 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 |
-## | | 0.219 | 0.312 | 0.094 | 0.312 | 0.031 | 0.031 | |
-## -------------------------------------------------------------------------------------------------------------------------
-##
-##
-## am vs gear
-## ----------------------------------------------------------------------------
-## | | gear |
-## ----------------------------------------------------------------------------
-## | am | 3 | 4 | 5 | Row Total |
-## ----------------------------------------------------------------------------
-## | 0 | 15 | 4 | 0 | 19 |
-## | | 0.469 | 0.125 | 0 | |
-## | | 0.79 | 0.21 | 0 | 0.59 |
-## | | 1 | 0.33 | 0 | |
-## ----------------------------------------------------------------------------
-## | 1 | 0 | 8 | 5 | 13 |
-## | | 0 | 0.25 | 0.156 | |
-## | | 0 | 0.62 | 0.38 | 0.41 |
-## | | 0 | 0.67 | 1 | |
-## ----------------------------------------------------------------------------
-## | Column Total | 15 | 12 | 5 | 32 |
-## | | 0.469 | 0.375 | 0.156 | |
-## ----------------------------------------------------------------------------
-##
-##
-## am vs carb
-## -------------------------------------------------------------------------------------------------------------------------
-## | | carb |
-## -------------------------------------------------------------------------------------------------------------------------
-## | am | 1 | 2 | 3 | 4 | 6 | 8 | Row Total |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 0 | 3 | 6 | 3 | 7 | 0 | 0 | 19 |
-## | | 0.094 | 0.188 | 0.094 | 0.219 | 0 | 0 | |
-## | | 0.16 | 0.32 | 0.16 | 0.37 | 0 | 0 | 0.6 |
-## | | 0.43 | 0.6 | 1 | 0.7 | 0 | 0 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 1 | 4 | 4 | 0 | 3 | 1 | 1 | 13 |
-## | | 0.125 | 0.125 | 0 | 0.094 | 0.031 | 0.031 | |
-## | | 0.31 | 0.31 | 0 | 0.23 | 0.08 | 0.08 | 0.41 |
-## | | 0.57 | 0.4 | 0 | 0.3 | 1 | 1 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 |
-## | | 0.219 | 0.313 | 0.094 | 0.313 | 0.031 | 0.031 | |
-## -------------------------------------------------------------------------------------------------------------------------
-##
-##
-## gear vs carb
-## -------------------------------------------------------------------------------------------------------------------------
-## | | carb |
-## -------------------------------------------------------------------------------------------------------------------------
-## | gear | 1 | 2 | 3 | 4 | 6 | 8 | Row Total |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 3 | 3 | 4 | 3 | 5 | 0 | 0 | 15 |
-## | | 0.094 | 0.125 | 0.094 | 0.156 | 0 | 0 | |
-## | | 0.2 | 0.27 | 0.2 | 0.33 | 0 | 0 | 0.47 |
-## | | 0.43 | 0.4 | 1 | 0.5 | 0 | 0 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 4 | 4 | 4 | 0 | 4 | 0 | 0 | 12 |
-## | | 0.125 | 0.125 | 0 | 0.125 | 0 | 0 | |
-## | | 0.33 | 0.33 | 0 | 0.33 | 0 | 0 | 0.38 |
-## | | 0.57 | 0.4 | 0 | 0.4 | 0 | 0 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | 5 | 0 | 2 | 0 | 1 | 1 | 1 | 5 |
-## | | 0 | 0.062 | 0 | 0.031 | 0.031 | 0.031 | |
-## | | 0 | 0.4 | 0 | 0.2 | 0.2 | 0.2 | 0.16 |
-## | | 0 | 0.2 | 0 | 0.1 | 1 | 1 | |
-## -------------------------------------------------------------------------------------------------------------------------
-## | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 |
-## | | 0.219 | 0.312 | 0.094 | 0.312 | 0.031 | 0.031 | |
-## -------------------------------------------------------------------------------------------------------------------------
-In exploring statistical distributions, we focus on the following:
-To explore the above 3 concepts, we have defined functions for the following distributions:
-Visualize how changes in mean and standard deviation affect the shape of the normal distribution.
-Suppose X, the grade on a exam, is normally distributed with mean 60 and standard deviation 3. The teacher wants to give 10% of the class an A. What should be the cutoff to determine who gets an A?
- -## Warning in dist_norm_perc(0.1, 60, 3, "upper"): `dist_normal_perc()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-The teacher wants to give lower 15% of the class a D. What cutoff should the teacher use to determine who gets an D?
- -## Warning in dist_norm_perc(0.85, 60, 3, "lower"): `dist_normal_perc()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-The teacher wants to give middle 50% of the class a B. What cutoff should the teacher use to determine who gets an B?
- -## Warning in dist_norm_perc(0.5, 60, 3, "both"): `dist_normal_perc()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-Let X be the IQ of a randomly selected student of a school. Assume X ~ N(90, 4). What is the probability that a randomly selected student has an IQ below 80?
- -## Warning in dist_norm_prob(80, mean = 90, sd = 4): `dist_normal_prob()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-What is the probability that a randomly selected student has an IQ above 100?
- -## Warning in dist_norm_prob(100, mean = 90, sd = 4, type = "upper"):
-## `dist_normal_prob()` has been soft deprecated and will be removed in
-## the next version of descriptr. Please use the vistributions package for
-## visualizing probability distributions.
-
-What is the probability that a randomly selected student has an IQ between 85 and 100?
- -## Warning in dist_norm_prob(c(85, 100), mean = 90, sd = 4, type = "both"):
-## `dist_normal_prob()` has been soft deprecated and will be removed in
-## the next version of descriptr. Please use the vistributions package for
-## visualizing probability distributions.
-
-Visualize how changes in number of trials and the probability of success affect the shape of the binomial distribution.
- -## Warning in dist_binom_plot(10, 0.3): `dist_binom_plot()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-## Warning in dist_binom_perc(10, 0.5, 0.05): `dist_binom_perc()` has been
-## soft deprecated and will be removed in the next version of descriptr.
-## Please use the vistributions package for visualizing probability
-## distributions.
-
-
-## Warning in dist_binom_perc(10, 0.5, 0.05, "upper"): `dist_binom_perc()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-Assume twenty-percent (20%) of Magemill have no health insurance. Randomly sample n = 12 Magemillians. Let X denote the number in the sample with no health insurance. What is the probability that exactly 4 of the 15 sampled have no health insurance?
- -## Warning in dist_binom_prob(12, 0.2, 4, type = "exact"): `dist_binom_prob()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-What is the probability that at most one of those sampled has no health insurance?
- -## Warning in dist_binom_prob(12, 0.2, 1, "lower"): `dist_binom_prob()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-What is the probability that more than seven have no health insurance?
- -## Warning in dist_binom_prob(12, 0.2, 8, "upper"): `dist_binom_prob()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-
-What is the probability that fewer than 5 have no health insurance?
- -## Warning in dist_binom_prob(12, 0.2, c(0, 4), "interval"):
-## `dist_binom_prob()` has been soft deprecated and will be removed in
-## the next version of descriptr. Please use the vistributions package for
-## visualizing probability distributions.
-
-Visualize how changes in degrees of freedom affect the shape of the chi square distribution.
- -## Warning in dist_chi_plot(df = 5): `dist_chi_plot()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-## Warning in dist_chi_plot(df = 5, normal = TRUE): `dist_chi_plot()`
-## has been soft deprecated and will be removed in the next version of
-## descriptr. Please use the vistributions package for visualizing probability
-## distributions.
-Let X be a chi-square random variable with 8 degrees of freedom. What is the upper fifth percentile?
- -## Warning in dist_chi_perc(0.05, 8, "upper"): `dist_chi_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the tenth percentile?
- -## Warning in dist_chi_perc(0.1, 8, "lower"): `dist_chi_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the probability that a chi-square random variable with 12 degrees of freedom is greater than 8.79?
- - -## Warning in dist_chi_prob(8.79, 12, "upper"): `dist_chi_prob()` has been
-## soft deprecated and will be removed in the next version of descriptr.
-## Please use the vistributions package for visualizing probability
-## distributions.
-
-What is the probability that a chi-square random variable with 12 degrees of freedom is greater than 8.62?
- - -## Warning in dist_chi_prob(8.62, 12, "lower"): `dist_chi_prob()` has been
-## soft deprecated and will be removed in the next version of descriptr.
-## Please use the vistributions package for visualizing probability
-## distributions.
-
-Visualize how changes in degrees of freedom affect the shape of the F distribution.
- -## Warning in dist_f_plot(): `dist_f_plot()` has been soft deprecated and will
-## be removed in the next version of descriptr. Please use the vistributions
-## package for visualizing probability distributions.
-
-
-## Warning in dist_f_plot(6, 10, normal = TRUE): `dist_f_plot()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-Let X be an F random variable with 4 numerator degrees of freedom and 5 denominator degrees of freedom. What is the upper twenth percentile?
- -## Warning in dist_f_perc(0.2, 4, 5, "upper"): `dist_f_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the 35th percentile?
- -## Warning in dist_f_perc(0.35, 4, 5, "lower"): `dist_f_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the probability that an F random variable with 4 numerator degrees of freedom and 5 denominator degrees of freedom is greater than 3.89?
- -## Warning in dist_f_prob(3.89, 4, 5, "upper"): `dist_f_prob()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the probability that an F random variable with 4 numerator degrees of freedom and 5 denominator degrees of freedom is less than 2.63?
- -## Warning in dist_f_prob(2.63, 4, 5, "lower"): `dist_f_prob()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-Visualize how degrees of freedom affect the shape of t distribution.
- -## Warning in dist_t_plot(df = 8): `dist_t_plot()` has been soft deprecated
-## and will be removed in the next version of descriptr. Please use the
-## vistributions package for visualizing probability distributions.
-
-What is the upper fifteenth percentile?
- -## Warning in dist_t_perc(0.15, 8, "upper"): `dist_t_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the eleventh percentile?
- -## Warning in dist_t_perc(0.11, 8, "lower"): `dist_t_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the area of the curve that has 95% of the t values?
- -## Warning in dist_t_perc(0.8, 8, "both"): `dist_t_perc()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-Let T follow a t-distribution with r = 6 df.
-What is the probability that the value of T is less than 2?
- -## Warning in dist_t_prob(2, 6, "lower"): `dist_t_prob()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the probability that the value of T is greater than 2?
- -## Warning in dist_t_prob(2, 6, "upper"): `dist_t_prob()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the probability that the value of T is between -2 and 2?
- -## Warning in dist_t_prob(2, 6, "both"): `dist_t_prob()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-What is the probability that the absolute value of T is greater than 2?
- -## Warning in dist_t_prob(2, 6, "interval"): `dist_t_prob()` has been soft
-## deprecated and will be removed in the next version of descriptr. Please use
-## the vistributions package for visualizing probability distributions.
-
-vignettes/visualization.Rmd
+ visualization.Rmd
In this document, we will introduce you to functions for generating different types of plots.
+We have modified the mtcars
data to create a new data set mtcarz
. The only difference between the two data sets is related to the variable types.
str(mtcarz)
+#> 'data.frame': 32 obs. of 11 variables:
+#> $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
+#> $ cyl : Factor w/ 3 levels "4","6","8": 2 2 1 2 3 2 3 1 1 2 ...
+#> $ disp: num 160 160 108 258 360 ...
+#> $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
+#> $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
+#> $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
+#> $ qsec: num 16.5 17 18.6 19.4 17 ...
+#> $ vs : Factor w/ 2 levels "0","1": 1 1 2 2 1 2 1 2 2 2 ...
+#> $ am : Factor w/ 2 levels "0","1": 2 2 2 1 1 1 1 1 1 1 ...
+#> $ gear: Factor w/ 3 levels "3","4","5": 2 2 2 1 1 1 1 2 2 2 ...
+#> $ carb: Factor w/ 6 levels "1","2","3","4",..: 4 4 1 1 2 1 4 2 2 4 ...
The following functions will create plots for all or subset of continuous variables in the data set.
+ + + + +ds_auto_freq_table
creates multiple one way tables by creating
+a frequency table for each categorical variable in a data frame.
+ds_auto_cross_table
creates multiple two way tables by creating a cross
+table for each unique pair of categorical variables in a data frame.
ds_auto_freq_table(data, ...) + +ds_auto_cross_table(data, ...)+ +
data | +A |
+
---|---|
... | +Column(s) in |
+
ds_auto_freq_table
is a extension of the ds_freq_table
+function. It creates a frequency table for each categorical variable in the
+dataframe. ds_auto_cross_table
is a extension of the ds_cross_table
+function. It creates a two way table for each unique pair of categorical
+variables in the dataframe.
ds_oway_tables()
and ds_tway_tables()
have been deprecated.
+Instead use ds_auto_freq_table()
and ds_auto_cross_table()
.
link{ds_freq_table}
link{ds_cross_table}
+# multiple one way tables +ds_auto_freq_table(mtcarz)#> Variable: cyl +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 4 11 11 34.38 34.38 +#> ----------------------------------------------------------------------- +#> 6 7 18 21.88 56.25 +#> ----------------------------------------------------------------------- +#> 8 14 32 43.75 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#> +#> Variable: vs +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 0 18 18 56.25 56.25 +#> ----------------------------------------------------------------------- +#> 1 14 32 43.75 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#> +#> Variable: am +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 0 19 19 59.38 59.38 +#> ----------------------------------------------------------------------- +#> 1 13 32 40.62 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#> +#> Variable: gear +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 3 15 15 46.88 46.88 +#> ----------------------------------------------------------------------- +#> 4 12 27 37.5 84.38 +#> ----------------------------------------------------------------------- +#> 5 5 32 15.62 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#> +#> Variable: carb +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 1 7 7 21.88 21.88 +#> ----------------------------------------------------------------------- +#> 2 10 17 31.25 53.12 +#> ----------------------------------------------------------------------- +#> 3 3 20 9.38 62.5 +#> ----------------------------------------------------------------------- +#> 4 10 30 31.25 93.75 +#> ----------------------------------------------------------------------- +#> 6 1 31 3.12 96.88 +#> ----------------------------------------------------------------------- +#> 8 1 32 3.12 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#>ds_auto_freq_table(mtcarz, cyl, gear)#> Variable: cyl +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 4 11 11 34.38 34.38 +#> ----------------------------------------------------------------------- +#> 6 7 18 21.88 56.25 +#> ----------------------------------------------------------------------- +#> 8 14 32 43.75 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#> +#> Variable: gear +#> ----------------------------------------------------------------------- +#> Levels Frequency Cum Frequency Percent Cum Percent +#> ----------------------------------------------------------------------- +#> 3 15 15 46.88 46.88 +#> ----------------------------------------------------------------------- +#> 4 12 27 37.5 84.38 +#> ----------------------------------------------------------------------- +#> 5 5 32 15.62 100 +#> ----------------------------------------------------------------------- +#> Total 32 - 100.00 - +#> ----------------------------------------------------------------------- +#>+# multiple two way tables +ds_auto_cross_table(mtcarz)#> Cell Contents +#> |---------------| +#> | Frequency | +#> | Percent | +#> | Row Pct | +#> | Col Pct | +#> |---------------| +#> +#> Total Observations: 32 +#> +#> cyl vs vs +#> ------------------------------------------------------------- +#> | | vs | +#> ------------------------------------------------------------- +#> | cyl | 0 | 1 | Row Total | +#> ------------------------------------------------------------- +#> | 4 | 1 | 10 | 11 | +#> | | 0.031 | 0.312 | | +#> | | 0.09 | 0.91 | 0.34 | +#> | | 0.06 | 0.71 | | +#> ------------------------------------------------------------- +#> | 6 | 3 | 4 | 7 | +#> | | 0.094 | 0.125 | | +#> | | 0.43 | 0.57 | 0.22 | +#> | | 0.17 | 0.29 | | +#> ------------------------------------------------------------- +#> | 8 | 14 | 0 | 14 | +#> | | 0.438 | 0 | | +#> | | 1 | 0 | 0.44 | +#> | | 0.78 | 0 | | +#> ------------------------------------------------------------- +#> | Column Total | 18 | 14 | 32 | +#> | | 0.563 | 0.437 | | +#> ------------------------------------------------------------- +#> +#> +#> cyl vs am +#> ------------------------------------------------------------- +#> | | am | +#> ------------------------------------------------------------- +#> | cyl | 0 | 1 | Row Total | +#> ------------------------------------------------------------- +#> | 4 | 3 | 8 | 11 | +#> | | 0.094 | 0.25 | | +#> | | 0.27 | 0.73 | 0.34 | +#> | | 0.16 | 0.62 | | +#> ------------------------------------------------------------- +#> | 6 | 4 | 3 | 7 | +#> | | 0.125 | 0.094 | | +#> | | 0.57 | 0.43 | 0.22 | +#> | | 0.21 | 0.23 | | +#> ------------------------------------------------------------- +#> | 8 | 12 | 2 | 14 | +#> | | 0.375 | 0.062 | | +#> | | 0.86 | 0.14 | 0.44 | +#> | | 0.63 | 0.15 | | +#> ------------------------------------------------------------- +#> | Column Total | 19 | 13 | 32 | +#> | | 0.594 | 0.406 | | +#> ------------------------------------------------------------- +#> +#> +#> cyl vs gear +#> ---------------------------------------------------------------------------- +#> | | gear | +#> ---------------------------------------------------------------------------- +#> | cyl | 3 | 4 | 5 | Row Total | +#> ---------------------------------------------------------------------------- +#> | 4 | 1 | 8 | 2 | 11 | +#> | | 0.031 | 0.25 | 0.062 | | +#> | | 0.09 | 0.73 | 0.18 | 0.34 | +#> | | 0.07 | 0.67 | 0.4 | | +#> ---------------------------------------------------------------------------- +#> | 6 | 2 | 4 | 1 | 7 | +#> | | 0.062 | 0.125 | 0.031 | | +#> | | 0.29 | 0.57 | 0.14 | 0.22 | +#> | | 0.13 | 0.33 | 0.2 | | +#> ---------------------------------------------------------------------------- +#> | 8 | 12 | 0 | 2 | 14 | +#> | | 0.375 | 0 | 0.062 | | +#> | | 0.86 | 0 | 0.14 | 0.44 | +#> | | 0.8 | 0 | 0.4 | | +#> ---------------------------------------------------------------------------- +#> | Column Total | 15 | 12 | 5 | 32 | +#> | | 0.468 | 0.375 | 0.155 | | +#> ---------------------------------------------------------------------------- +#> +#> +#> cyl vs carb +#> ------------------------------------------------------------------------------------------------------------------------- +#> | | carb | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | cyl | 1 | 2 | 3 | 4 | 6 | 8 | Row Total | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 4 | 5 | 6 | 0 | 0 | 0 | 0 | 11 | +#> | | 0.156 | 0.188 | 0 | 0 | 0 | 0 | | +#> | | 0.45 | 0.55 | 0 | 0 | 0 | 0 | 0.34 | +#> | | 0.71 | 0.6 | 0 | 0 | 0 | 0 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 6 | 2 | 0 | 0 | 4 | 1 | 0 | 7 | +#> | | 0.062 | 0 | 0 | 0.125 | 0.031 | 0 | | +#> | | 0.29 | 0 | 0 | 0.57 | 0.14 | 0 | 0.22 | +#> | | 0.29 | 0 | 0 | 0.4 | 1 | 0 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 8 | 0 | 4 | 3 | 6 | 0 | 1 | 14 | +#> | | 0 | 0.125 | 0.094 | 0.188 | 0 | 0.031 | | +#> | | 0 | 0.29 | 0.21 | 0.43 | 0 | 0.07 | 0.44 | +#> | | 0 | 0.4 | 1 | 0.6 | 0 | 1 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 | +#> | | 0.218 | 0.313 | 0.094 | 0.313 | 0.031 | 0.031 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> +#> +#> vs vs am +#> ------------------------------------------------------------- +#> | | am | +#> ------------------------------------------------------------- +#> | vs | 0 | 1 | Row Total | +#> ------------------------------------------------------------- +#> | 0 | 12 | 6 | 18 | +#> | | 0.375 | 0.188 | | +#> | | 0.67 | 0.33 | 0.56 | +#> | | 0.63 | 0.46 | | +#> ------------------------------------------------------------- +#> | 1 | 7 | 7 | 14 | +#> | | 0.219 | 0.219 | | +#> | | 0.5 | 0.5 | 0.44 | +#> | | 0.37 | 0.54 | | +#> ------------------------------------------------------------- +#> | Column Total | 19 | 13 | 32 | +#> | | 0.594 | 0.407 | | +#> ------------------------------------------------------------- +#> +#> +#> vs vs gear +#> ---------------------------------------------------------------------------- +#> | | gear | +#> ---------------------------------------------------------------------------- +#> | vs | 3 | 4 | 5 | Row Total | +#> ---------------------------------------------------------------------------- +#> | 0 | 12 | 2 | 4 | 18 | +#> | | 0.375 | 0.062 | 0.125 | | +#> | | 0.67 | 0.11 | 0.22 | 0.56 | +#> | | 0.8 | 0.17 | 0.8 | | +#> ---------------------------------------------------------------------------- +#> | 1 | 3 | 10 | 1 | 14 | +#> | | 0.094 | 0.312 | 0.031 | | +#> | | 0.21 | 0.71 | 0.07 | 0.44 | +#> | | 0.2 | 0.83 | 0.2 | | +#> ---------------------------------------------------------------------------- +#> | Column Total | 15 | 12 | 5 | 32 | +#> | | 0.469 | 0.374 | 0.156 | | +#> ---------------------------------------------------------------------------- +#> +#> +#> vs vs carb +#> ------------------------------------------------------------------------------------------------------------------------- +#> | | carb | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | vs | 1 | 2 | 3 | 4 | 6 | 8 | Row Total | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 0 | 0 | 5 | 3 | 8 | 1 | 1 | 18 | +#> | | 0 | 0.156 | 0.094 | 0.25 | 0.031 | 0.031 | | +#> | | 0 | 0.28 | 0.17 | 0.44 | 0.06 | 0.06 | 0.56 | +#> | | 0 | 0.5 | 1 | 0.8 | 1 | 1 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 1 | 7 | 5 | 0 | 2 | 0 | 0 | 14 | +#> | | 0.219 | 0.156 | 0 | 0.062 | 0 | 0 | | +#> | | 0.5 | 0.36 | 0 | 0.14 | 0 | 0 | 0.44 | +#> | | 1 | 0.5 | 0 | 0.2 | 0 | 0 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 | +#> | | 0.219 | 0.312 | 0.094 | 0.312 | 0.031 | 0.031 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> +#> +#> am vs gear +#> ---------------------------------------------------------------------------- +#> | | gear | +#> ---------------------------------------------------------------------------- +#> | am | 3 | 4 | 5 | Row Total | +#> ---------------------------------------------------------------------------- +#> | 0 | 15 | 4 | 0 | 19 | +#> | | 0.469 | 0.125 | 0 | | +#> | | 0.79 | 0.21 | 0 | 0.59 | +#> | | 1 | 0.33 | 0 | | +#> ---------------------------------------------------------------------------- +#> | 1 | 0 | 8 | 5 | 13 | +#> | | 0 | 0.25 | 0.156 | | +#> | | 0 | 0.62 | 0.38 | 0.41 | +#> | | 0 | 0.67 | 1 | | +#> ---------------------------------------------------------------------------- +#> | Column Total | 15 | 12 | 5 | 32 | +#> | | 0.469 | 0.375 | 0.156 | | +#> ---------------------------------------------------------------------------- +#> +#> +#> am vs carb +#> ------------------------------------------------------------------------------------------------------------------------- +#> | | carb | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | am | 1 | 2 | 3 | 4 | 6 | 8 | Row Total | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 0 | 3 | 6 | 3 | 7 | 0 | 0 | 19 | +#> | | 0.094 | 0.188 | 0.094 | 0.219 | 0 | 0 | | +#> | | 0.16 | 0.32 | 0.16 | 0.37 | 0 | 0 | 0.6 | +#> | | 0.43 | 0.6 | 1 | 0.7 | 0 | 0 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 1 | 4 | 4 | 0 | 3 | 1 | 1 | 13 | +#> | | 0.125 | 0.125 | 0 | 0.094 | 0.031 | 0.031 | | +#> | | 0.31 | 0.31 | 0 | 0.23 | 0.08 | 0.08 | 0.41 | +#> | | 0.57 | 0.4 | 0 | 0.3 | 1 | 1 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 | +#> | | 0.219 | 0.313 | 0.094 | 0.313 | 0.031 | 0.031 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> +#> +#> gear vs carb +#> ------------------------------------------------------------------------------------------------------------------------- +#> | | carb | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | gear | 1 | 2 | 3 | 4 | 6 | 8 | Row Total | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 3 | 3 | 4 | 3 | 5 | 0 | 0 | 15 | +#> | | 0.094 | 0.125 | 0.094 | 0.156 | 0 | 0 | | +#> | | 0.2 | 0.27 | 0.2 | 0.33 | 0 | 0 | 0.47 | +#> | | 0.43 | 0.4 | 1 | 0.5 | 0 | 0 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 4 | 4 | 4 | 0 | 4 | 0 | 0 | 12 | +#> | | 0.125 | 0.125 | 0 | 0.125 | 0 | 0 | | +#> | | 0.33 | 0.33 | 0 | 0.33 | 0 | 0 | 0.38 | +#> | | 0.57 | 0.4 | 0 | 0.4 | 0 | 0 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | 5 | 0 | 2 | 0 | 1 | 1 | 1 | 5 | +#> | | 0 | 0.062 | 0 | 0.031 | 0.031 | 0.031 | | +#> | | 0 | 0.4 | 0 | 0.2 | 0.2 | 0.2 | 0.16 | +#> | | 0 | 0.2 | 0 | 0.1 | 1 | 1 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> | Column Total | 7 | 10 | 3 | 10 | 1 | 1 | 32 | +#> | | 0.219 | 0.312 | 0.094 | 0.312 | 0.031 | 0.031 | | +#> ------------------------------------------------------------------------------------------------------------------------- +#> +#>ds_auto_cross_table(mtcarz, cyl, gear, am)#> Cell Contents +#> |---------------| +#> | Frequency | +#> | Percent | +#> | Row Pct | +#> | Col Pct | +#> |---------------| +#> +#> Total Observations: 32 +#> +#> cyl vs gear +#> ---------------------------------------------------------------------------- +#> | | gear | +#> ---------------------------------------------------------------------------- +#> | cyl | 3 | 4 | 5 | Row Total | +#> ---------------------------------------------------------------------------- +#> | 4 | 1 | 8 | 2 | 11 | +#> | | 0.031 | 0.25 | 0.062 | | +#> | | 0.09 | 0.73 | 0.18 | 0.34 | +#> | | 0.07 | 0.67 | 0.4 | | +#> ---------------------------------------------------------------------------- +#> | 6 | 2 | 4 | 1 | 7 | +#> | | 0.062 | 0.125 | 0.031 | | +#> | | 0.29 | 0.57 | 0.14 | 0.22 | +#> | | 0.13 | 0.33 | 0.2 | | +#> ---------------------------------------------------------------------------- +#> | 8 | 12 | 0 | 2 | 14 | +#> | | 0.375 | 0 | 0.062 | | +#> | | 0.86 | 0 | 0.14 | 0.44 | +#> | | 0.8 | 0 | 0.4 | | +#> ---------------------------------------------------------------------------- +#> | Column Total | 15 | 12 | 5 | 32 | +#> | | 0.468 | 0.375 | 0.155 | | +#> ---------------------------------------------------------------------------- +#> +#> +#> cyl vs am +#> ------------------------------------------------------------- +#> | | am | +#> ------------------------------------------------------------- +#> | cyl | 0 | 1 | Row Total | +#> ------------------------------------------------------------- +#> | 4 | 3 | 8 | 11 | +#> | | 0.094 | 0.25 | | +#> | | 0.27 | 0.73 | 0.34 | +#> | | 0.16 | 0.62 | | +#> ------------------------------------------------------------- +#> | 6 | 4 | 3 | 7 | +#> | | 0.125 | 0.094 | | +#> | | 0.57 | 0.43 | 0.22 | +#> | | 0.21 | 0.23 | | +#> ------------------------------------------------------------- +#> | 8 | 12 | 2 | 14 | +#> | | 0.375 | 0.062 | | +#> | | 0.86 | 0.14 | 0.44 | +#> | | 0.63 | 0.15 | | +#> ------------------------------------------------------------- +#> | Column Total | 19 | 13 | 32 | +#> | | 0.594 | 0.406 | | +#> ------------------------------------------------------------- +#> +#> +#> gear vs am +#> ------------------------------------------------------------- +#> | | am | +#> ------------------------------------------------------------- +#> | gear | 0 | 1 | Row Total | +#> ------------------------------------------------------------- +#> | 3 | 15 | 0 | 15 | +#> | | 0.469 | 0 | | +#> | | 1 | 0 | 0.47 | +#> | | 0.79 | 0 | | +#> ------------------------------------------------------------- +#> | 4 | 4 | 8 | 12 | +#> | | 0.125 | 0.25 | | +#> | | 0.33 | 0.67 | 0.38 | +#> | | 0.21 | 0.62 | | +#> ------------------------------------------------------------- +#> | 5 | 0 | 5 | 5 | +#> | | 0 | 0.156 | | +#> | | 0 | 1 | 0.16 | +#> | | 0 | 0.38 | | +#> ------------------------------------------------------------- +#> | Column Total | 19 | 13 | 32 | +#> | | 0.594 | 0.406 | | +#> ------------------------------------------------------------- +#> +#>
Generate summary statistics for all continuous variables in data.
+ +ds_auto_group_summary(data, ...)+ +
data | +A |
+
---|---|
... | +Column(s) in |
+
+ds_auto_group_summary(mtcarz, cyl, gear, mpg, disp)#> mpg by cyl +#> ----------------------------------------------------------------------------------------- +#> | Statistic/Levels| 4| 6| 8| +#> ----------------------------------------------------------------------------------------- +#> | Obs| 11| 7| 14| +#> | Minimum| 21.4| 17.8| 10.4| +#> | Maximum| 33.9| 21.4| 19.2| +#> | Mean| 26.66| 19.74| 15.1| +#> | Median| 26| 19.7| 15.2| +#> | Mode| 22.8| 21| 10.4| +#> | Std. Deviation| 4.51| 1.45| 2.56| +#> | Variance| 20.34| 2.11| 6.55| +#> | Skewness| 0.35| -0.26| -0.46| +#> | Kurtosis| -1.43| -1.83| 0.33| +#> | Uncorrected SS| 8023.83| 2741.14| 3277.34| +#> | Corrected SS| 203.39| 12.68| 85.2| +#> | Coeff Variation| 16.91| 7.36| 16.95| +#> | Std. Error Mean| 1.36| 0.55| 0.68| +#> | Range| 12.5| 3.6| 8.8| +#> | Interquartile Range| 7.6| 2.35| 1.85| +#> ----------------------------------------------------------------------------------------- +#> +#> +#> +#> mpg by gear +#> ----------------------------------------------------------------------------------------- +#> | Statistic/Levels| 3| 4| 5| +#> ----------------------------------------------------------------------------------------- +#> | Obs| 15| 12| 5| +#> | Minimum| 10.4| 17.8| 15| +#> | Maximum| 21.5| 33.9| 30.4| +#> | Mean| 16.11| 24.53| 21.38| +#> | Median| 15.5| 22.8| 19.7| +#> | Mode| 10.4| 21| 15| +#> | Std. Deviation| 3.37| 5.28| 6.66| +#> | Variance| 11.37| 27.84| 44.34| +#> | Skewness| -0.09| 0.7| 0.56| +#> | Kurtosis| -0.38| -0.77| -1.83| +#> | Uncorrected SS| 4050.52| 7528.9| 2462.89| +#> | Corrected SS| 159.15| 306.29| 177.37| +#> | Coeff Variation| 20.93| 21.51| 31.15| +#> | Std. Error Mean| 0.87| 1.52| 2.98| +#> | Range| 11.1| 16.1| 15.4| +#> | Interquartile Range| 3.9| 7.08| 10.2| +#> ----------------------------------------------------------------------------------------- +#> +#> +#> +#> disp by cyl +#> ----------------------------------------------------------------------------------------- +#> | Statistic/Levels| 4| 6| 8| +#> ----------------------------------------------------------------------------------------- +#> | Obs| 11| 7| 14| +#> | Minimum| 71.1| 145| 275.8| +#> | Maximum| 146.7| 258| 472| +#> | Mean| 105.14| 183.31| 353.1| +#> | Median| 108| 167.6| 350.5| +#> | Mode| 71.1| 160| 275.8| +#> | Std. Deviation| 26.87| 41.56| 67.77| +#> | Variance| 722.08| 1727.44| 4592.95| +#> | Skewness| 0.16| 1.3| 0.57| +#> | Kurtosis| -1.41| 0.39| -0.86| +#> | Uncorrected SS| 128811| 245593.5| 1805223| +#> | Corrected SS| 7220.83| 10364.63| 59708.38| +#> | Coeff Variation| 25.56| 22.67| 19.19| +#> | Std. Error Mean| 8.1| 15.71| 18.11| +#> | Range| 75.6| 113| 196.2| +#> | Interquartile Range| 41.8| 36.3| 88.25| +#> ----------------------------------------------------------------------------------------- +#> +#> +#> +#> disp by gear +#> ----------------------------------------------------------------------------------------- +#> | Statistic/Levels| 3| 4| 5| +#> ----------------------------------------------------------------------------------------- +#> | Obs| 15| 12| 5| +#> | Minimum| 120.1| 71.1| 95.1| +#> | Maximum| 472| 167.6| 351| +#> | Mean| 326.3| 123.02| 202.48| +#> | Median| 318| 130.9| 145| +#> | Mode| 275.8| 160| 95.1| +#> | Std. Deviation| 94.85| 38.91| 115.49| +#> | Variance| 8997.04| 1513.93| 13338.09| +#> | Skewness| -0.3| -0.23| 0.61| +#> | Kurtosis| 0.2| -1.83| -2.59| +#> | Uncorrected SS| 1723034| 198250.4| 258343.1| +#> | Corrected SS| 125958.6| 16653.24| 53352.35| +#> | Coeff Variation| 29.07| 31.63| 57.04| +#> | Std. Error Mean| 24.49| 11.23| 51.65| +#> | Range| 351.9| 96.5| 255.9| +#> | Interquartile Range| 104.2| 81.08| 180.7| +#> ----------------------------------------------------------------------------------------- +#> +#> +#>+
R/ds-auto-summary.R
+ ds_auto_summary_stats.Rd
Generate summary statistics & frequency table for all continuous variables in data.
+ +ds_auto_summary_stats(data, ...)+ +
data | +A |
+
---|---|
... | +Column(s) in |
+
+ds_auto_summary_stats(mtcarz)#> --------------------------------- Variable: mpg -------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> --------------------------------- Variable: mpg -------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 36.32 +#> Missing 0.00 Std Deviation 6.03 +#> Mean 20.09 Range 23.50 +#> Median 19.20 Interquartile Range 7.38 +#> Mode 10.40 Uncorrected SS 14042.31 +#> Trimmed Mean 19.95 Corrected SS 1126.05 +#> Skewness 0.67 Coeff Variation 30.00 +#> Kurtosis -0.02 Std Error Mean 1.07 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 33.90 +#> 99% 33.44 +#> 95% 31.30 +#> 90% 30.09 +#> Q3 22.80 +#> Median 19.20 +#> Q1 15.43 +#> 10% 14.34 +#> 5% 12.00 +#> 1% 10.40 +#> Min 10.40 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 15 10.4 20 33.9 +#> 16 10.4 18 32.4 +#> 24 13.3 19 30.4 +#> 7 14.3 28 30.4 +#> 17 14.7 26 27.3 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: mpg +#> |-----------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |-----------------------------------------------------------------------| +#> | 10.4 - 15.1 | 6 | 6 | 18.75 | 18.75 | +#> |-----------------------------------------------------------------------| +#> | 15.1 - 19.8 | 12 | 18 | 37.5 | 56.25 | +#> |-----------------------------------------------------------------------| +#> | 19.8 - 24.5 | 8 | 26 | 25 | 81.25 | +#> |-----------------------------------------------------------------------| +#> | 24.5 - 29.2 | 2 | 28 | 6.25 | 87.5 | +#> |-----------------------------------------------------------------------| +#> | 29.2 - 33.9 | 4 | 32 | 12.5 | 100 | +#> |-----------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |-----------------------------------------------------------------------| +#> +#> +#> -------------------------------- Variable: disp -------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> -------------------------------- Variable: disp -------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 15360.80 +#> Missing 0.00 Std Deviation 123.94 +#> Mean 230.72 Range 400.90 +#> Median 196.30 Interquartile Range 205.18 +#> Mode 275.80 Uncorrected SS 2179627.47 +#> Trimmed Mean 228.00 Corrected SS 476184.79 +#> Skewness 0.42 Coeff Variation 53.72 +#> Kurtosis -1.07 Std Error Mean 21.91 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 472.00 +#> 99% 468.28 +#> 95% 449.00 +#> 90% 396.00 +#> Q3 326.00 +#> Median 196.30 +#> Q1 120.83 +#> 10% 80.61 +#> 5% 77.35 +#> 1% 72.53 +#> Min 71.10 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 20 71.1 15 472 +#> 19 75.7 16 460 +#> 18 78.7 17 440 +#> 26 79 25 400 +#> 28 95.1 5 360 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: disp +#> |---------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |---------------------------------------------------------------------------| +#> | 71.1 - 151.3 | 12 | 12 | 37.5 | 37.5 | +#> |---------------------------------------------------------------------------| +#> | 151.3 - 231.5 | 5 | 17 | 15.62 | 53.12 | +#> |---------------------------------------------------------------------------| +#> | 231.5 - 311.6 | 6 | 23 | 18.75 | 71.88 | +#> |---------------------------------------------------------------------------| +#> | 311.6 - 391.8 | 5 | 28 | 15.62 | 87.5 | +#> |---------------------------------------------------------------------------| +#> | 391.8 - 472 | 4 | 32 | 12.5 | 100 | +#> |---------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |---------------------------------------------------------------------------| +#> +#> +#> --------------------------------- Variable: hp --------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> --------------------------------- Variable: hp --------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 4700.87 +#> Missing 0.00 Std Deviation 68.56 +#> Mean 146.69 Range 283.00 +#> Median 123.00 Interquartile Range 83.50 +#> Mode 110.00 Uncorrected SS 834278.00 +#> Trimmed Mean 143.57 Corrected SS 145726.88 +#> Skewness 0.80 Coeff Variation 46.74 +#> Kurtosis 0.28 Std Error Mean 12.12 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 335.00 +#> 99% 312.99 +#> 95% 253.55 +#> 90% 243.50 +#> Q3 180.00 +#> Median 123.00 +#> Q1 96.50 +#> 10% 66.00 +#> 5% 63.65 +#> 1% 55.10 +#> Min 52.00 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 19 52 31 335 +#> 8 62 29 264 +#> 20 65 7 245 +#> 18 66 24 245 +#> 26 66 17 230 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: hp +#> |-------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |-------------------------------------------------------------------------| +#> | 52 - 108.6 | 10 | 10 | 31.25 | 31.25 | +#> |-------------------------------------------------------------------------| +#> | 108.6 - 165.2 | 9 | 19 | 28.12 | 59.38 | +#> |-------------------------------------------------------------------------| +#> | 165.2 - 221.8 | 8 | 27 | 25 | 84.38 | +#> |-------------------------------------------------------------------------| +#> | 221.8 - 278.4 | 4 | 31 | 12.5 | 96.88 | +#> |-------------------------------------------------------------------------| +#> | 278.4 - 335 | 1 | 32 | 3.12 | 100 | +#> |-------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |-------------------------------------------------------------------------| +#> +#> +#> -------------------------------- Variable: drat -------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> -------------------------------- Variable: drat -------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 0.29 +#> Missing 0.00 Std Deviation 0.53 +#> Mean 3.60 Range 2.17 +#> Median 3.70 Interquartile Range 0.84 +#> Mode 3.07 Uncorrected SS 422.79 +#> Trimmed Mean 3.58 Corrected SS 8.86 +#> Skewness 0.29 Coeff Variation 14.87 +#> Kurtosis -0.45 Std Error Mean 0.09 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 4.93 +#> 99% 4.78 +#> 95% 4.31 +#> 90% 4.21 +#> Q3 3.92 +#> Median 3.70 +#> Q1 3.08 +#> 10% 3.01 +#> 5% 2.85 +#> 1% 2.76 +#> Min 2.76 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 6 2.76 19 4.93 +#> 22 2.76 27 4.43 +#> 15 2.93 20 4.22 +#> 16 3 29 4.22 +#> 12 3.07 32 4.11 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: drat +#> |-------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |-------------------------------------------------------------------------| +#> | 2.8 - 3.2 | 11 | 11 | 34.38 | 34.38 | +#> |-------------------------------------------------------------------------| +#> | 3.2 - 3.6 | 4 | 15 | 12.5 | 46.88 | +#> |-------------------------------------------------------------------------| +#> | 3.6 - 4.1 | 10 | 25 | 31.25 | 78.12 | +#> |-------------------------------------------------------------------------| +#> | 4.1 - 4.5 | 6 | 31 | 18.75 | 96.88 | +#> |-------------------------------------------------------------------------| +#> | 4.5 - 4.9 | 1 | 32 | 3.12 | 100 | +#> |-------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |-------------------------------------------------------------------------| +#> +#> +#> --------------------------------- Variable: wt --------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> --------------------------------- Variable: wt --------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 0.96 +#> Missing 0.00 Std Deviation 0.98 +#> Mean 3.22 Range 3.91 +#> Median 3.33 Interquartile Range 1.03 +#> Mode 3.44 Uncorrected SS 360.90 +#> Trimmed Mean 3.20 Corrected SS 29.68 +#> Skewness 0.47 Coeff Variation 30.41 +#> Kurtosis 0.42 Std Error Mean 0.17 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 5.42 +#> 99% 5.40 +#> 95% 5.29 +#> 90% 4.05 +#> Q3 3.61 +#> Median 3.33 +#> Q1 2.58 +#> 10% 1.96 +#> 5% 1.74 +#> 1% 1.54 +#> Min 1.51 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 28 1.513 16 5.424 +#> 19 1.615 17 5.345 +#> 20 1.835 15 5.25 +#> 26 1.935 12 4.07 +#> 27 2.14 25 3.845 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: wt +#> |---------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |---------------------------------------------------------------------------| +#> | 1.5 - 2.3 | 6 | 6 | 18.75 | 18.75 | +#> |---------------------------------------------------------------------------| +#> | 2.3 - 3.1 | 6 | 12 | 18.75 | 37.5 | +#> |---------------------------------------------------------------------------| +#> | 3.1 - 3.9 | 16 | 28 | 50 | 87.5 | +#> |---------------------------------------------------------------------------| +#> | 3.9 - 4.6 | 1 | 29 | 3.12 | 90.62 | +#> |---------------------------------------------------------------------------| +#> | 4.6 - 5.4 | 3 | 32 | 9.38 | 100 | +#> |---------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |---------------------------------------------------------------------------| +#> +#> +#> -------------------------------- Variable: qsec -------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> -------------------------------- Variable: qsec -------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 3.19 +#> Missing 0.00 Std Deviation 1.79 +#> Mean 17.85 Range 8.40 +#> Median 17.71 Interquartile Range 2.01 +#> Mode 17.02 Uncorrected SS 10293.48 +#> Trimmed Mean 17.79 Corrected SS 98.99 +#> Skewness 0.41 Coeff Variation 10.01 +#> Kurtosis 0.86 Std Error Mean 0.32 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 22.90 +#> 99% 22.07 +#> 95% 20.10 +#> 90% 19.99 +#> Q3 18.90 +#> Median 17.71 +#> Q1 16.89 +#> 10% 15.53 +#> 5% 15.05 +#> 1% 14.53 +#> Min 14.50 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 29 14.5 9 22.9 +#> 31 14.6 6 20.22 +#> 24 15.41 21 20.01 +#> 30 15.5 8 20 +#> 7 15.84 20 19.9 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: qsec +#> |-------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |-------------------------------------------------------------------------| +#> | 14.5 - 16.2 | 5 | 5 | 15.62 | 15.62 | +#> |-------------------------------------------------------------------------| +#> | 16.2 - 17.9 | 12 | 17 | 37.5 | 53.12 | +#> |-------------------------------------------------------------------------| +#> | 17.9 - 19.5 | 10 | 27 | 31.25 | 84.38 | +#> |-------------------------------------------------------------------------| +#> | 19.5 - 21.2 | 4 | 31 | 12.5 | 96.88 | +#> |-------------------------------------------------------------------------| +#> | 21.2 - 22.9 | 1 | 32 | 3.12 | 100 | +#> |-------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |-------------------------------------------------------------------------| +#> +#>ds_auto_summary_stats(mtcarz, disp, hp)#> -------------------------------- Variable: disp -------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> -------------------------------- Variable: disp -------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 15360.80 +#> Missing 0.00 Std Deviation 123.94 +#> Mean 230.72 Range 400.90 +#> Median 196.30 Interquartile Range 205.18 +#> Mode 275.80 Uncorrected SS 2179627.47 +#> Trimmed Mean 228.00 Corrected SS 476184.79 +#> Skewness 0.42 Coeff Variation 53.72 +#> Kurtosis -1.07 Std Error Mean 21.91 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 472.00 +#> 99% 468.28 +#> 95% 449.00 +#> 90% 396.00 +#> Q3 326.00 +#> Median 196.30 +#> Q1 120.83 +#> 10% 80.61 +#> 5% 77.35 +#> 1% 72.53 +#> Min 71.10 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 20 71.1 15 472 +#> 19 75.7 16 460 +#> 18 78.7 17 440 +#> 26 79 25 400 +#> 28 95.1 5 360 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: disp +#> |---------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |---------------------------------------------------------------------------| +#> | 71.1 - 151.3 | 12 | 12 | 37.5 | 37.5 | +#> |---------------------------------------------------------------------------| +#> | 151.3 - 231.5 | 5 | 17 | 15.62 | 53.12 | +#> |---------------------------------------------------------------------------| +#> | 231.5 - 311.6 | 6 | 23 | 18.75 | 71.88 | +#> |---------------------------------------------------------------------------| +#> | 311.6 - 391.8 | 5 | 28 | 15.62 | 87.5 | +#> |---------------------------------------------------------------------------| +#> | 391.8 - 472 | 4 | 32 | 12.5 | 100 | +#> |---------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |---------------------------------------------------------------------------| +#> +#> +#> --------------------------------- Variable: hp --------------------------------- +#> +#> ------------------------------ Summary Statistics ------------------------------ +#> +#> --------------------------------- Variable: hp --------------------------------- +#> +#> Univariate Analysis +#> +#> N 32.00 Variance 4700.87 +#> Missing 0.00 Std Deviation 68.56 +#> Mean 146.69 Range 283.00 +#> Median 123.00 Interquartile Range 83.50 +#> Mode 110.00 Uncorrected SS 834278.00 +#> Trimmed Mean 143.57 Corrected SS 145726.88 +#> Skewness 0.80 Coeff Variation 46.74 +#> Kurtosis 0.28 Std Error Mean 12.12 +#> +#> Quantiles +#> +#> Quantile Value +#> +#> Max 335.00 +#> 99% 312.99 +#> 95% 253.55 +#> 90% 243.50 +#> Q3 180.00 +#> Median 123.00 +#> Q1 96.50 +#> 10% 66.00 +#> 5% 63.65 +#> 1% 55.10 +#> Min 52.00 +#> +#> Extreme Values +#> +#> Low High +#> +#> Obs Value Obs Value +#> 19 52 31 335 +#> 8 62 29 264 +#> 20 65 7 245 +#> 18 66 24 245 +#> 26 66 17 230 +#> +#> +#> +#> NULL +#> +#> +#> ---------------------------- Frequency Distribution ---------------------------- +#> +#> Variable: hp +#> |-------------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |-------------------------------------------------------------------------| +#> | 52 - 108.6 | 10 | 10 | 31.25 | 31.25 | +#> |-------------------------------------------------------------------------| +#> | 108.6 - 165.2 | 9 | 19 | 28.12 | 59.38 | +#> |-------------------------------------------------------------------------| +#> | 165.2 - 221.8 | 8 | 27 | 25 | 84.38 | +#> |-------------------------------------------------------------------------| +#> | 221.8 - 278.4 | 4 | 31 | 12.5 | 96.88 | +#> |-------------------------------------------------------------------------| +#> | 278.4 - 335 | 1 | 32 | 3.12 | 100 | +#> |-------------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |-------------------------------------------------------------------------| +#> +#>+
ds_cross_table()
has been deprecated. Instead use
-ds_cross_table()
.
@@ -80,10 +80,13 @@ @@ -158,15 +161,6 @@# alternate -ds_twoway_table(mtcarz, cyl, gear)#>#> # A tibble: 8 x 6 +ds_twoway_table(mtcarz, cyl, gear)#>#> # A tibble: 8 x 6 #> cyl gear count percent row_percent col_percent -#> <fct> <fct> <int> <dbl> <dbl> <dbl> -#> 1 4 3 1 0.0312 0.0909 0.0667 -#> 2 4 4 8 0.25 0.727 0.667 -#> 3 4 5 2 0.0625 0.182 0.4 -#> 4 6 3 2 0.0625 0.286 0.133 -#> 5 6 4 4 0.125 0.571 0.333 -#> 6 6 5 1 0.0312 0.143 0.2 -#> 7 8 3 12 0.375 0.857 0.8 -#> 8 8 5 2 0.0625 0.143 0.4+#> <fct> <fct> <int> <dbl> <dbl> <dbl> +#> 1 4 3 1 0.0312 0.0909 0.0667 +#> 2 4 4 8 0.25 0.727 0.667 +#> 3 4 5 2 0.0625 0.182 0.4 +#> 4 6 3 2 0.0625 0.286 0.133 +#> 5 6 4 4 0.125 0.571 0.333 +#> 6 6 5 1 0.0312 0.143 0.2 +#> 7 8 3 12 0.375 0.857 0.8 +#> 8 8 5 2 0.0625 0.143 0.4
Any NA values are stripped from x
before computation
takes place.
stat_css()
has been deprecated. Instead use ds_css()
.
NIST/SEMATECH e-Handbook of Statistical Methods
-@@ -179,10 +173,6 @@ds_css(mtcars$mpg)#> [1] 1126.047
Any NA values are stripped from x
before computation
takes place.
stat_cvar()
has been deprecated. Instead use ds_cvar()
.
@@ -169,8 +167,6 @@ds_cvar(mtcars$mpg)#> [1] 29.99881
@@ -133,14 +136,14 @@ds_extreme_obs(mtcarz, mpg)#> # A tibble: 10 x 3 +@@ -83,10 +83,13 @@ds_extreme_obs(mtcarz, mpg)#> # A tibble: 10 x 3 #> type value index -#> <chr> <dbl> <int> -#> 1 high 33.9 20 -#> 2 high 32.4 18 -#> 3 high 30.4 19 -#> 4 high 30.4 28 -#> 5 high 27.3 26 -#> 6 low 10.4 15 -#> 7 low 10.4 16 -#> 8 low 13.3 24 -#> 9 low 14.3 7 -#> 10 low 14.7 17+#> <chr> <dbl> <int> +#> 1 high 33.9 20 +#> 2 high 32.4 18 +#> 3 high 30.4 19 +#> 4 high 30.4 28 +#> 5 high 27.3 26 +#> 6 low 10.4 15 +#> 7 low 10.4 16 +#> 8 low 13.3 24 +#> 9 low 14.3 7 +#> 10 low 14.7 17
Frequency table for factor data and returns the frequency, cumulative
-frequency, frequency percent and cumulative frequency percent.
-barplot.ds_freq_table()
creates bar plot for the
-frequency table created using ds_freq_table()
.
Frequency table for categorical and continuous data and returns the
+frequency, cumulative frequency, frequency percent and cumulative frequency
+percent. plot.ds_freq_table()
creates bar plot for the categorical
+data and histogram for continuous data.
ds_freq_table(data, variable) +ds_freq_table(data, variable, bins = 5) # S3 method for ds_freq_table plot(x, ...)@@ -156,6 +159,10 @@Arg
variable + Column in
data
.+ bins ++ Number of intervals into which the data must be split.
- x @@ -166,26 +173,13 @@ An object of class
ds_freq_table
.Arg
Value
- --
ds_freq_table
returns an object of class"ds_freq_table"
. -An object of class"ds_freq_table"
is a list containing the -following components:
Frequency table.
freq_table()
has been deprecated. Instead use ds_freq_table()
.
@@ -160,11 +163,6 @@# frequency table +@@ -80,10 +80,13 @@# categorical data ds_freq_table(mtcarz, cyl)#> Variable: cyl #> ----------------------------------------------------------------------- #> Levels Frequency Cum Frequency Percent Cum Percent @@ -202,6 +196,26 @@Examp # barplot k <- ds_freq_table(mtcarz, cyl) plot(k)
+# continuous data +ds_freq_table(mtcarz, mpg)#> Variable: mpg +#> |-----------------------------------------------------------------------| +#> | Bins | Frequency | Cum Frequency | Percent | Cum Percent | +#> |-----------------------------------------------------------------------| +#> | 10.4 - 15.1 | 6 | 6 | 18.75 | 18.75 | +#> |-----------------------------------------------------------------------| +#> | 15.1 - 19.8 | 12 | 18 | 37.5 | 56.25 | +#> |-----------------------------------------------------------------------| +#> | 19.8 - 24.5 | 8 | 26 | 25 | 81.25 | +#> |-----------------------------------------------------------------------| +#> | 24.5 - 29.2 | 2 | 28 | 6.25 | 87.5 | +#> |-----------------------------------------------------------------------| +#> | 29.2 - 33.9 | 4 | 32 | 12.5 | 100 | +#> |-----------------------------------------------------------------------| +#> | Total | 32 | - | 100.00 | - | +#> |-----------------------------------------------------------------------|
Returns the geometric mean of x
gmean()
has been deprecated. Instead use ds_gmean()
.
Data for boxplot method.
ds_group_summary()
has been deprecated. Instead
-use ds_group_summary()
.
Returns the harmonic mean of x
hmean()
has been deprecated. Instead use ds_hmean()
.
Any NA values are stripped from x
before computation
takes place.
kurtosis()
has been deprecated. Instead use ds_kurtosis()
.
Sheskin, D.J. (2000) Handbook of Parametric and Nonparametric Statistical Procedures, Second Edition. Boca Raton, Florida: Chapman & Hall/CRC.
@@ -184,8 +182,6 @@mean
. Any NA values are stripped from x
before computation
takes place
- stat_mdev()
has been deprecated. Instead use ds_mdev()
.
ds_measures_location(data, column, trim = 0.05)+
ds_measures_location(data, ..., trim = 0.05)
A |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
column | -Column in |
+ ... | +Column(s) in |
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
trim | @@ -156,10 +159,22 @@
A |
|||
column | -Column in |
+ ... | +Column(s) in |
---|
@@ -136,7 +139,7 @@ds_measures_symmetry(mtcarz, mpg)#> # A tibble: 1 x 2 -#> skewness kurtosis -#> <dbl> <dbl> -#> 1 0.672 -0.0220+@@ -81,10 +81,13 @@ds_measures_symmetry(mtcarz)#> # A tibble: 6 x 3 +#> var skewness kurtosis +#> <chr> <dbl> <dbl> +#> 1 disp 0.420 -1.07 +#> 2 drat 0.293 -0.450 +#> 3 hp 0.799 0.275 +#> 4 mpg 0.672 -0.0220 +#> 5 qsec 0.406 0.865 +#> 6 wt 0.466 0.417ds_measures_symmetry(mtcarz, mpg)#> # A tibble: 1 x 3 +#> var skewness kurtosis +#> <chr> <dbl> <dbl> +#> 1 mpg 0.672 -0.0220ds_measures_symmetry(mtcarz, mpg, disp)#> # A tibble: 2 x 3 +#> var skewness kurtosis +#> <chr> <dbl> <dbl> +#> 1 disp 0.420 -1.07 +#> 2 mpg 0.672 -0.0220
ds_measures_variation(data, column)+
ds_measures_variation(data, ...)
A |
|||
column | -Column in |
+ ... | +Column(s) in |
---|
@@ -158,11 +161,6 @@ds_measures_variation(mtcarz, mpg)#> # A tibble: 1 x 6 -#> range iqr variance sd coeff_var std_error -#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 23.5 7.38 36.3 6.03 30.0 1.07+@@ -80,10 +80,13 @@ds_measures_variation(mtcarz)#> # A tibble: 6 x 7 +#> var range iqr variance sd coeff_var std_error +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 disp 401. 205. 15361. 124. 53.7 21.9 +#> 2 drat 2.17 0.840 0.286 0.535 14.9 0.0945 +#> 3 hp 283 83.5 4701. 68.6 46.7 12.1 +#> 4 mpg 23.5 7.38 36.3 6.03 30.0 1.07 +#> 5 qsec 8.40 2.01 3.19 1.79 10.0 0.316 +#> 6 wt 3.91 1.03 0.957 0.978 30.4 0.173ds_measures_variation(mtcarz, mpg)#> # A tibble: 1 x 7 +#> var range iqr variance sd coeff_var std_error +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 mpg 23.5 7.38 36.3 6.03 30.0 1.07ds_measures_variation(mtcarz, mpg, disp)#> # A tibble: 2 x 7 +#> var range iqr variance sd coeff_var std_error +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 disp 401. 205. 15361. 124. 53.7 21.9 +#> 2 mpg 23.5 7.38 36.3 6.03 30.0 1.07
Any NA values are stripped from x
before computation
takes place.
`stat_mode()` has been deprecated. Instead use `ds_mode()`.
-ds_percentiles(data, column)+
ds_percentiles(data, ...)
A |
|||
column | -Column in |
+ ... | +Column(s) in |
---|
@@ -153,11 +156,6 @@ds_percentiles(mtcarz, mpg)#> # A tibble: 1 x 11 -#> min per1 per5 per10 q1 median q3 per95 per90 per99 max -#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> -#> 1 10.4 10.4 12.0 14.3 15.4 19.2 22.8 31.3 30.1 33.4 33.9+@@ -80,10 +80,13 @@ds_percentiles(mtcarz)#> # A tibble: 6 x 12 +#> var min per1 per5 per10 q1 median q3 per95 per90 per99 max +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 disp 71.1 72.5 77.4 80.6 121. 196. 326 449 396. 468. 472 +#> 2 drat 2.76 2.76 2.85 3.01 3.08 3.70 3.92 4.31 4.21 4.78 4.93 +#> 3 hp 52 55.1 63.6 66 96.5 123 180 254. 244. 313. 335 +#> 4 mpg 10.4 10.4 12.0 14.3 15.4 19.2 22.8 31.3 30.1 33.4 33.9 +#> 5 qsec 14.5 14.5 15.0 15.5 16.9 17.7 18.9 20.1 20.0 22.1 22.9 +#> 6 wt 1.51 1.54 1.74 1.96 2.58 3.32 3.61 5.29 4.05 5.40 5.42ds_percentiles(mtcarz, mpg)#> # A tibble: 1 x 12 +#> var min per1 per5 per10 q1 median q3 per95 per90 per99 max +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 mpg 10.4 10.4 12.0 14.3 15.4 19.2 22.8 31.3 30.1 33.4 33.9ds_percentiles(mtcarz, mpg, disp)#> # A tibble: 2 x 12 +#> var min per1 per5 per10 q1 median q3 per95 per90 per99 max +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 disp 71.1 72.5 77.4 80.6 121. 196. 326 449 396. 468. 472 +#> 2 mpg 10.4 10.4 12.0 14.3 15.4 19.2 22.8 31.3 30.1 33.4 33.9
Range of x
stat_range()
has been deprecated. Instead use ds_range()
.
Any NA values are stripped from data
and values
before
computation takes place.
rindex()
has been deprecated. Instead use ds_rindex()
.
@@ -176,8 +174,6 @@ds_rindex(mtcars$mpg, 21)#> [1] 1 2ds_rindex(mtcars$mpg, 22)#> NULL
ds_screener(y) +ds_screener(data) # S3 method for ds_screener plot(x, ...)@@ -147,7 +150,7 @@Arg
y | +data | A |
||||||
---|---|---|---|---|---|---|---|---|
A |
|||
variable | -Column in |
+ ... | +Column(s) in |
---|
summary_stats()
has been deprecated. Instead use
-ds_summary_stats()
.
@@ -163,14 +166,9 @@ds_summary_stats(mtcarz, mpg)#> Univariate Analysis +@@ -80,10 +80,13 @@ds_summary_stats(mtcarz, mpg)#> --------------------------------- Variable: mpg -------------------------------- +#> +#> Univariate Analysis #> #> N 32.00 Variance 36.32 #> Missing 0.00 Std Deviation 6.03 @@ -198,7 +197,10 @@Examp #> 16 10.4 18 32.4 #> 24 13.3 19 30.4 #> 7 14.3 28 30.4 -#> 17 14.7 26 27.3
+#> 17 14.7 26 27.3 +#> +#> +#>
Any NA values are stripped from data
before computation
takes place.
tailobs()
has been deprecated. Instead use ds_tailobs()
.
Descriptive statistics for multiple variables.
+ +ds_tidy_stats(data, ...)+ +
data | +A |
+
---|---|
... | +Columns in |
+
A tibble.
+ +ds_multi_stats()
have been deprecated. Instead use ds_tidy_stats()
.
+ds_tidy_stats(mtcarz)#> # A tibble: 6 x 16 +#> vars min max mean t_mean median mode range variance stdev skew +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 disp 71.1 472 231. 228 196. 276. 401. 1.54e+4 124. 0.420 +#> 2 drat 2.76 4.93 3.60 3.58 3.70 3.07 2.17 2.86e-1 0.535 0.293 +#> 3 hp 52 335 147. 144. 123 110 283 4.70e+3 68.6 0.799 +#> 4 mpg 10.4 33.9 20.1 20.0 19.2 10.4 23.5 3.63e+1 6.03 0.672 +#> 5 qsec 14.5 22.9 17.8 17.8 17.7 17.0 8.40 3.19e+0 1.79 0.406 +#> 6 wt 1.51 5.42 3.22 3.20 3.32 3.44 3.91 9.57e-1 0.978 0.466 +#> # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>, q1 <dbl>, +#> # q3 <dbl>, iqrange <dbl>ds_tidy_stats(mtcarz, mpg, disp, hp)#> # A tibble: 3 x 16 +#> vars min max mean t_mean median mode range variance stdev skew +#> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> +#> 1 disp 71.1 472 231. 228 196. 276. 401. 15361. 124. 0.420 +#> 2 hp 52 335 147. 144. 123 110 283 4701. 68.6 0.799 +#> 3 mpg 10.4 33.9 20.1 20.0 19.2 10.4 23.5 36.3 6.03 0.672 +#> # ... with 5 more variables: kurtosis <dbl>, coeff_var <dbl>, q1 <dbl>, +#> # q3 <dbl>, iqrange <dbl>+
The screener()
function will screen data frames and return details such as variable names, class, levels and missing values. The plot.screener()
creates bar plots to visualize % of missing observations for each variable in a data frame.
The screener()
function will screen data frames and return details such as variable names, class, levels and missing values. The plot.screener()
creates bar plots to visualize % of missing observations for each variable in a data frame.
The following functions ease the process of generating and visualizing descriptive statistics for categorical and continuous data.
+The following functions ease the process of generating and visualizing descriptive statistics for continuous data.
Descriptive statistics and frquency tables
plot(<ds_cross_table>)
ds_twoway_table()
+ Two way table
Tidy descriptive statistics
plot(<ds_freq_cont>)
+ Frequency distribution of continuous data
Measures of location
Groupwise descriptive statistics
Measures of variation
Multiple variable statistics
Measures of symmetry
Multiple One & Two Way Tables
Percentiles
Measures of location
Extreme observations
The following functions ease the process of generating and visualizing descriptive statistics for categorical data.
+Measures of variation
Frequency table
Measures of symmetry
Two way table
Percentiles
Multiple One & Two Way Tables
The following functions generate grouped summary statistics.
+Groupwise descriptive statistics
Extreme observations
Tabulation
Visualize how different parameters affect the shape of a distribution. Compute/visualize probability from a given quantile and quantiles out of given probability.
+The following functions generate plots for different data types.
Generate scatter plots
Generate histograms
Generate density plots
Visualize normal distribution
Generate bar plots
Visualize binomial distribution
Generate box plots
Visualize chi square distribution
Compare distributions
Visualize f distribution
Generate stacked bar plots
Visualize t distribution
Generate grouped bar plots