Skip to content

Commit

Permalink
very rough draft of readme
Browse files Browse the repository at this point in the history
  • Loading branch information
joshyam-k committed Jan 24, 2024
1 parent 6e46887 commit 90b1aed
Show file tree
Hide file tree
Showing 4 changed files with 122 additions and 29 deletions.
77 changes: 68 additions & 9 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,26 +12,85 @@ knitr::opts_chunk$set(
)
```

# saeczi

<!-- badges: start -->
[![R-CMD-check](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

# Development Mode

saeczi is still under development. Please use at your own risk!
## saeczi
#### (Small Area Estimation for Continous Zero Inflated data)

`saeczi` is an R package that implements a small area estimator that uses a two-stage modeling approach for zero-inflated response variables. In particular, we are working with variables that follow a semi-continuous distribution with a mixture of zeroes and positive continuously distributed values. An example can be seen below:

```{r zi-plot, echo=F, message=F, warning=F}
set.seed(6)
library(tidyverse)
z <- tibble(
z = rep(0, 2500)
)
nz <- tibble(
nz = rgamma(1000, 3, 4)
)
ggplot() +
geom_density(
data = filter(nz, nz > 0.25),
aes(x = nz, y = ..count..),
fill = "#023047",
color = "white",
alpha = 0.9
) +
geom_histogram(
data = z,
aes(x = z),
fill = "#023047",
color = "white",
alpha = 0.9
) +
labs(
x = "Response Variable"
) +
theme_bw()
```


# saeczi
`saeczi` first fits a linear mixed model to the non-zero portion of the response and then a generalized linear mixed model with binomial response to classify the probability of zero for a given data point. In estimation these models are each applied to new data points and combined to compute a final prediction.

saeczi is an R package that allows for the fitting of a zero-inflation estimator onto a sample dataset. Please note that, in order for a dataset to compatible with the zero-inflation estimator, the dataset must be of a sample dataset, where the means at the domain level of what would be considered a "population" dataset for the auxiliary variables must be available. To fit the zero-inflation estimator,first analyze the sample data and find a set of auxiliary variables that create a good model for both the linear regression model, as well as the logistic regression model. Lastly, assess the number of repetitions desired when fitting the bootstrap sample to estimate the variance. Once all of those things are decided, the unit_zi function can be used to predict domain level estimates of a sample dataset.
The package can also generate MSE estimates using a parametric bootstrap approach described in Chandra and Sud (2012) either in parallel or sequentially.

## Installation

You can install saezi from github with:
You can install the developmental version of `saeczi` from GitHub with:

```{r gh-installation, eval = FALSE}
install.packages("devtools")
devtools::install_github("harvard-ufds/saeczi")
# install.packages("pak")
pak::pkg_install("harvard-ufds/saeczi")
```

## Example

We'll use the internal package datasets to show an example of how to use `saeczi`.

```{r, warning=FALSE, message=FALSE}
library(saeczi)
data(pop)
data(samp)
lin_formula <- DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev
set.seed(5)
result <- unit_zi(samp_dat = samp,
pop_dat = pop,
lin_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
log_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
domain_level = "COUNTYFIPS",
mse_est = TRUE,
B = 100,
parallel = FALSE)
result$res |> head()
```

74 changes: 54 additions & 20 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,36 +1,70 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

# saeczi

<!-- badges: start -->
[![R-CMD-check](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

# Development Mode
## saeczi

#### (Small Area Estimation for Continous Zero Inflated data)

`saeczi` is an R package that implements a small area estimator that
uses a two-stage modeling approach for zero-inflated response variables.
In particular, we are working with variables that follow a
semi-continuous distribution with a mixture of zeroes and positive
continuously distributed values. An example can be seen below:

saeczi is still under development. Please use at your own risk!
![](README-zi-plot-1.png)<!-- -->

# saeczi
`saeczi` first fits a linear mixed model to the non-zero portion of the
response and then a generalized linear mixed model with binomial
response to classify the probability of zero for a given data point. In
estimation these models are each applied to new data points and combined
to compute a final prediction.

saeczi is an R package that allows for the fitting of a zero-inflation
estimator onto a sample dataset. Please note that, in order for a
dataset to compatible with the zero-inflation estimator, the dataset
must be of a sample dataset, where the means at the domain level of what
would be considered a “population” dataset for the auxiliary variables
must be available. To fit the zero-inflation estimator,first analyze the
sample data and find a set of auxiliary variables that create a good
model for both the linear regression model, as well as the logistic
regression model. Lastly, assess the number of repetitions desired when
fitting the bootstrap sample to estimate the variance. Once all of those
things are decided, the unit_zi function can be used to predict domain
level estimates of a sample dataset.
The package can also generate MSE estimates using a parametric bootstrap
approach described in Chandra and Sud (2012) either in parallel or
sequentially.

## Installation

You can install saezi from github with:
You can install the developmental version of `saeczi` from GitHub with:

``` r
# install.packages("pak")
pak::pkg_install("harvard-ufds/saeczi")
```

## Example

We’ll use the internal package datasets to show an example of how to use
`saeczi`.

``` r
install.packages("devtools")
devtools::install_github("harvard-ufds/saeczi")
library(saeczi)
data(pop)
data(samp)

lin_formula <- DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev

set.seed(5)
result <- unit_zi(samp_dat = samp,
pop_dat = pop,
lin_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
log_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
domain_level = "COUNTYFIPS",
mse_est = TRUE,
B = 100,
parallel = FALSE)


result$res |> head()
#> domain mse est
#> 1 41001 61.01495 14.85495
#> 2 41003 87.99835 97.74967
#> 3 41005 176.88206 86.02207
#> 4 41007 344.48027 76.24752
#> 5 41009 76.81402 70.28624
#> 6 41011 80.75565 87.65072
```
Binary file added figs/README-unnamed-chunk-2-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added figs/README-zi-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 90b1aed

Please sign in to comment.