very rough draft of readme

harvard-ufds · Jan 24, 2024 · 90b1aed · 90b1aed
1 parent 6e46887
commit 90b1aed
Show file tree

Hide file tree

Showing 4 changed files with 122 additions and 29 deletions.
diff --git a/README.Rmd b/README.Rmd
@@ -12,26 +12,85 @@ knitr::opts_chunk$set(
 )
 ```
 
-# saeczi
-
  <!-- badges: start -->
   [![R-CMD-check](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml)
   <!-- badges: end -->
 
-# Development Mode
 
-saeczi is still under development.  Please use at your own risk!  
+## saeczi
+#### (Small Area Estimation for Continous Zero Inflated data) 
+
+`saeczi` is an R package that implements a small area estimator that uses a two-stage modeling approach for zero-inflated response variables. In particular, we are working with variables that follow a semi-continuous distribution with a mixture of zeroes and positive continuously distributed values. An example can be seen below:
+
+```{r zi-plot, echo=F, message=F, warning=F}
+set.seed(6)
+library(tidyverse)
+
+z <- tibble(
+  z = rep(0, 2500)
+)
+
+nz <- tibble(
+  nz = rgamma(1000, 3, 4)
+)
+
+ggplot() +
+  geom_density(
+    data = filter(nz, nz > 0.25),
+    aes(x = nz, y = ..count..),
+    fill = "#023047",
+    color = "white",
+    alpha = 0.9
+    ) +
+  geom_histogram(
+    data = z,
+    aes(x = z),
+    fill = "#023047",
+    color = "white",
+    alpha = 0.9
+    ) +
+  labs(
+    x = "Response Variable"
+  ) +
+  theme_bw()
+```
+
 
-# saeczi
+`saeczi` first fits a linear mixed model to the non-zero portion of the response and then a generalized linear mixed model with binomial response to classify the probability of zero for a given data point. In estimation these models are each applied to new data points and combined to compute a final prediction. 
 
-saeczi is an R package that allows for the fitting of a zero-inflation estimator onto a sample dataset. Please note that, in order for a dataset to compatible with the zero-inflation estimator, the dataset must be of a sample dataset, where the means at the domain level of what would be considered a "population" dataset for the auxiliary variables must be available. To fit the zero-inflation estimator,first analyze the sample data and find a set of auxiliary variables that create a good model for both the linear regression model, as well as the logistic regression model. Lastly, assess the number of repetitions desired when fitting the bootstrap sample to estimate the variance. Once all of those things are decided, the unit_zi function can be used to predict domain level estimates of a sample dataset. 
+The package can also generate MSE estimates using a parametric bootstrap approach described in Chandra and Sud (2012) either in parallel or sequentially.
 
 ## Installation
 
-You can install saezi from github with:
+You can install the developmental version of `saeczi` from GitHub with:
 
 ```{r gh-installation, eval = FALSE}
-install.packages("devtools")
-devtools::install_github("harvard-ufds/saeczi")
+# install.packages("pak")
+pak::pkg_install("harvard-ufds/saeczi")
+```
+
+## Example
+
+We'll use the internal package datasets to show an example of how to use `saeczi`.
+
+```{r, warning=FALSE, message=FALSE}
+library(saeczi)
+data(pop)
+data(samp)
+
+lin_formula <- DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev
+
+set.seed(5)
+result <- unit_zi(samp_dat = samp,
+                  pop_dat = pop, 
+                  lin_formula =  DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
+                  log_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
+                  domain_level = "COUNTYFIPS",
+                  mse_est = TRUE,
+                  B = 100,
+                  parallel = FALSE)
+
+
+result$res |> head()
 ```
 
diff --git a/README.md b/README.md
@@ -1,36 +1,70 @@
 
 <!-- README.md is generated from README.Rmd. Please edit that file -->
 
-# saeczi
-
 <!-- badges: start -->
 [![R-CMD-check](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml)
 <!-- badges: end -->
 
-# Development Mode
+## saeczi
+
+#### (Small Area Estimation for Continous Zero Inflated data)
+
+`saeczi` is an R package that implements a small area estimator that
+uses a two-stage modeling approach for zero-inflated response variables.
+In particular, we are working with variables that follow a
+semi-continuous distribution with a mixture of zeroes and positive
+continuously distributed values. An example can be seen below:
 
-saeczi is still under development. Please use at your own risk!
+![](README-zi-plot-1.png)<!-- -->
 
-# saeczi
+`saeczi` first fits a linear mixed model to the non-zero portion of the
+response and then a generalized linear mixed model with binomial
+response to classify the probability of zero for a given data point. In
+estimation these models are each applied to new data points and combined
+to compute a final prediction.
 
-saeczi is an R package that allows for the fitting of a zero-inflation
-estimator onto a sample dataset. Please note that, in order for a
-dataset to compatible with the zero-inflation estimator, the dataset
-must be of a sample dataset, where the means at the domain level of what
-would be considered a “population” dataset for the auxiliary variables
-must be available. To fit the zero-inflation estimator,first analyze the
-sample data and find a set of auxiliary variables that create a good
-model for both the linear regression model, as well as the logistic
-regression model. Lastly, assess the number of repetitions desired when
-fitting the bootstrap sample to estimate the variance. Once all of those
-things are decided, the unit_zi function can be used to predict domain
-level estimates of a sample dataset.
+The package can also generate MSE estimates using a parametric bootstrap
+approach described in Chandra and Sud (2012) either in parallel or
+sequentially.
 
 ## Installation
 
-You can install saezi from github with:
+You can install the developmental version of `saeczi` from GitHub with:
+
+``` r
+# install.packages("pak")
+pak::pkg_install("harvard-ufds/saeczi")
+```
+
+## Example
+
+We’ll use the internal package datasets to show an example of how to use
+`saeczi`.
 
 ``` r
-install.packages("devtools")
-devtools::install_github("harvard-ufds/saeczi")
+library(saeczi)
+data(pop)
+data(samp)
+
+lin_formula <- DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev
+
+set.seed(5)
+result <- unit_zi(samp_dat = samp,
+                  pop_dat = pop, 
+                  lin_formula =  DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
+                  log_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
+                  domain_level = "COUNTYFIPS",
+                  mse_est = TRUE,
+                  B = 100,
+                  parallel = FALSE)
+
+
+result$res |> head()
+#>   domain       mse      est
+#> 1  41001  61.01495 14.85495
+#> 2  41003  87.99835 97.74967
+#> 3  41005 176.88206 86.02207
+#> 4  41007 344.48027 76.24752
+#> 5  41009  76.81402 70.28624
+#> 6  41011  80.75565 87.65072
 ```
diff --git a/figs/README-unnamed-chunk-2-1.png b/figs/README-unnamed-chunk-2-1.png
diff --git a/figs/README-zi-plot-1.png b/figs/README-zi-plot-1.png