Skip to content

Commit

Permalink
more filled out readme
Browse files Browse the repository at this point in the history
  • Loading branch information
joshyam-k committed Jan 24, 2024
1 parent ed08649 commit 004cd86
Show file tree
Hide file tree
Showing 3 changed files with 69 additions and 29 deletions.
Binary file modified README-zi-plot-1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
36 changes: 24 additions & 12 deletions README.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -12,17 +12,18 @@ knitr::opts_chunk$set(
)
```

# saeczi
### (Small Area Estimation for Continuous Zero Inflated data)

<!-- badges: start -->
[![R-CMD-check](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

## Overview

# saeczi
### (Small Area Estimation for Continous Zero Inflated data)

`saeczi` is an R package that implements a small area estimator that uses a two-stage modeling approach for zero-inflated response variables. In particular, we are working with variables that follow a semi-continuous distribution with a mixture of zeroes and positive continuously distributed values. An example can be seen below:
`saeczi` is an R package that implements a small area estimator that uses a two-stage modeling approach for zero-inflated response variables. In particular, we are working with variables that follow a semi-continuous distribution with a mixture of zeroes and positive continuously distributed values. An example can be seen below.

```{r zi-plot, dpi = 300, fig.width = 5, fig.height=3, echo=F, message=F, warning=F}
```{r zi-plot, dpi = 300, fig.width = 10, fig.height=6, echo=F, message=F, warning=F}
set.seed(6)
library(tidyverse)
Expand Down Expand Up @@ -55,7 +56,6 @@ ggplot() +
theme_bw()
```


`saeczi` first fits a linear mixed model to the non-zero portion of the response and then a generalized linear mixed model with binomial response to classify the probability of zero for a given data point. In estimation these models are each applied to new data points and combined to compute a final prediction.

The package can also generate MSE estimates using a parametric bootstrap approach described in Chandra and Sud (2012) either in parallel or sequentially.
Expand All @@ -69,28 +69,40 @@ You can install the developmental version of `saeczi` from GitHub with:
pak::pkg_install("harvard-ufds/saeczi")
```

## Example
## Usage

We'll use the internal package data to show an example of how to use `saeczi`. The two data sets contained within the package contain example forestry data collected by the Forestry Inventory and Analysis (FIA) research program.

- `saeczi::samp`: Example FIA plot-level sample data for each county in Oregon.
- `saeczi::pop`: Example FIA pixel level population auxiliary data for each county in Oregon.

We'll use the internal package datasets to show an example of how to use `saeczi`.
The main response variable included in `samp` is above ground live biomass and our small areas in this case are the counties in Oregon. To keep things simple we will use tree canopy cover (tcc16) and elevation (elev) as our predictors in both of the models. We can use `saeczi` to get estimates for the mean biomass in each county as well as the corresponding bootstrapped (B = 100) MSE estimate as follows.

```{r, warning=FALSE, message=FALSE}
library(saeczi)
data(pop)
data(samp)
lin_formula <- DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev
set.seed(5)
result <- unit_zi(samp_dat = samp,
pop_dat = pop,
lin_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
log_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
domain_level = "COUNTYFIPS",
mse_est = TRUE,
B = 100,
B = 500,
parallel = FALSE)
```


The function returns the original call, a data frame containing the estimates, a log of any modeling warnings and messages from the bootstrap procedure, as well as the linear and logistic model objects used to compute the estimates.

```{r}
names(result)
```

As there are 36 total counties in Oregon, we will just look at the first few rows of the results:

```{r}
result$res |> head()
```

62 changes: 45 additions & 17 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@

<!-- README.md is generated from README.Rmd. Please edit that file -->

# saeczi

### (Small Area Estimation for Continuous Zero Inflated data)

<!-- badges: start -->
[![R-CMD-check](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/harvard-ufds/saeczi/actions/workflows/R-CMD-check.yaml)
<!-- badges: end -->

# saeczi

### (Small Area Estimation for Continous Zero Inflated data)
## Overview

`saeczi` is an R package that implements a small area estimator that
uses a two-stage modeling approach for zero-inflated response variables.
In particular, we are working with variables that follow a
semi-continuous distribution with a mixture of zeroes and positive
continuously distributed values. An example can be seen below:
continuously distributed values. An example can be seen below.

![](README-zi-plot-1.png)<!-- -->

Expand All @@ -36,35 +38,61 @@ You can install the developmental version of `saeczi` from GitHub with:
pak::pkg_install("harvard-ufds/saeczi")
```

## Example
## Usage

We’ll use the internal package data to show an example of how to use
`saeczi`. The two data sets contained within the package contain example
forestry data collected by the Forestry Inventory and Analysis (FIA)
research program.

- `saeczi::samp`: Example FIA plot-level sample data for each county in
Oregon.
- `saeczi::pop`: Example FIA pixel level population auxiliary data for
each county in Oregon.

We’ll use the internal package datasets to show an example of how to use
`saeczi`.
The main response variable included in `samp` is above ground live
biomass and our small areas in this case are the counties in Oregon. To
keep things simple we will use tree canopy cover (tcc16) and elevation
(elev) as our predictors in both of the models. We can use `saeczi` to
get estimates for the mean biomass in each county as well as the
corresponding bootstrapped (B = 100) MSE estimate as follows.

``` r
library(saeczi)
data(pop)
data(samp)

lin_formula <- DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev

set.seed(5)
result <- unit_zi(samp_dat = samp,
pop_dat = pop,
lin_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
log_formula = DRYBIO_AG_TPA_live_ADJ ~ tcc16 + elev,
domain_level = "COUNTYFIPS",
mse_est = TRUE,
B = 100,
B = 500,
parallel = FALSE)
```

The function returns the original call, a data frame containing the
estimates, a log of any modeling warnings and messages from the
bootstrap procedure, as well as the linear and logistic model objects
used to compute the estimates.

``` r
names(result)
#> [1] "call" "res" "bootstrap_log" "lin_mod"
#> [5] "log_mod"
```

As there are 36 total counties in Oregon, we will just look at the first
few rows of the results:

``` r
result$res |> head()
#> domain mse est
#> 1 41001 61.01495 14.85495
#> 2 41003 87.99835 97.74967
#> 3 41005 176.88206 86.02207
#> 4 41007 344.48027 76.24752
#> 5 41009 76.81402 70.28624
#> 6 41011 80.75565 87.65072
#> 1 41001 348.52996 14.85495
#> 2 41003 10.02939 97.74967
#> 3 41005 529.20263 86.02207
#> 4 41007 245.04692 76.24752
#> 5 41009 481.13961 70.28624
#> 6 41011 269.96902 87.65072
```

0 comments on commit 004cd86

Please sign in to comment.