diff --git a/.gitignore b/.gitignore
index 6f41851..dfc3e64 100644
--- a/.gitignore
+++ b/.gitignore
@@ -8,4 +8,5 @@ TAGS
*~
# Undo-tree save-files
-*.~undo-tree
\ No newline at end of file
+*.~undo-tree
+inst/doc
diff --git a/DESCRIPTION b/DESCRIPTION
index 11a50f7..d3a7606 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -34,3 +34,7 @@ Imports: coda,
ggpubr,
formula.tools
RoxygenNote: 6.1.1
+Suggests:
+ knitr,
+ rmarkdown
+VignetteBuilder: knitr
diff --git a/vignettes/.gitignore b/vignettes/.gitignore
new file mode 100644
index 0000000..097b241
--- /dev/null
+++ b/vignettes/.gitignore
@@ -0,0 +1,2 @@
+*.html
+*.R
diff --git a/vignettes/Report.Rmd b/vignettes/Report.Rmd
new file mode 100644
index 0000000..7238636
--- /dev/null
+++ b/vignettes/Report.Rmd
@@ -0,0 +1,247 @@
+---
+title: "Introduction to hdpGLM"
+author: "Jiewen Luo"
+date: "`r Sys.Date()`"
+header-includes:
+ - \usepackage{amsmath}
+ - \usepackage{amsfonts}
+ - \usepackage{amssymb}
+
+output: rmarkdown::html_vignette
+bibliography: references.bib
+vignette: >
+ %\VignetteIndexEntry{Introduction to hdpGLM}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+---
+
+```{r, include = FALSE}
+knitr::opts_chunk$set(
+ collapse = TRUE,
+ comment = "#>"
+)
+```
+
+## Summary
+The hdpGLM package implements the hierarchical Dirichlet process Generalized Linear Models (GLM) proposed by @ferrari2020modeling. The model can be used to estimate latent heterogeneity in the marginal effect of GLM linear coefficients, cluster data points based on that latent heterogeneity, and investigate if Simpson’s Paradox occurs due to latent or omitted features. It also can be used with hierarchical data to estimate the effect of upper-level covariates on the latent heterogeneity in the effect of lower-level features.
+
+## Technical Details
+@ferrari2020modeling proposes a hierarchical Dirichlet process of generalized linear models to estimate heterogeneous marginal effects. To estimate such models, a (blocked) Gibbs sampler was adopted to handle continuous outcome variables; whereas a Hamiltonian Monte Carlo within Gibbs was adopted to handle discrete outcome variables.\
+
+### (I) Modeling
+ For each observation $i$, denote:\
+(i) $y_i$ as the outcome variable; (ii) $X_i \in \mathbb{R}^{D_X+1}$ as the set of observed covariates including a column of one for the intercept term; (iii) $K$ as the number of heterogeneous groups in a given population (context) such that it can be bigger if the population is bigger; (iv) $Z_i=k$, or equivalently $Z_{ik}$, as the group of $i$ in a given population (context), where $k \in \{1, \dots, K\}$; (v) $J$ as the number of different populations (contexts); (vi) $C_i =j$, or equivalently $C_{ij}$, as the context of observation $i$, where $j \in \{1, \dots, J\}$; (vii) $W_j \in \mathbb{R}^{(D_W+1)}$ denotes context-level features of context j including a column of
+one for the intercept term, for $j \in \{1, \dots, J\}$; (viii) $\tau \in \mathbb{R}^{(D_W+1) \times(D_X+1)}$ as the set of parameters; \
+
+GLM are one of the most used tools for multivariate analyses. However, when $Z_i$ is latent, $K$ is unknown, and heterogeneity is also varied by context, conventional GLM are biased and inconsistent. Therefore, @ferrari2020modeling have suggested integrating GLM with context-dependent Dirichlet process prior [@mallick1997combining;@carota2002semiparametric;@de2004anova;@muller2004method;@teh2005sharing] to estimate cluster-specific effects. More specifically, if p(.) is a distribution in the exponential family, $g$ is a link function, and $\theta=(\beta, \sigma)$, then the group- and context-specific hierarchical Dirichlet process generalized linear model (hdpGLM) is given by:\
+
+\begin{align}
+&G_j|\alpha_0, G_0, W_j \sim \mathcal{DP}(\alpha_0, G_0(W_j))\\
+&\theta_{ij} | G_j \sim G_j\\
+&y_i|X_i, C_i, \theta_{ij}, \sim p(y_i|X_i, \theta_{ij}), \mathbb{E}[y_i|.]=\mu_{ij}=g^{-1}(X_i^T\mathbb{\beta}_{ij}).
+\end{align}
+
+
+In this model, each atom $\theta_{ij}$ follows a hierarchical Dirichlet process $G_j$ with scale parameter $\alpha_0$ and context-dependent base measure $G_0(W_j)$. And the conditional distribution of $y$ given data and atom $\theta_{ij}$ follows a exponential distribution. More desirably, one can use the stick-breaking construction [@sethuraman1994constructive;@teh2005sharing] to rewrite the hdpGLM as followed:\
+
+\begin{align}
+&V_I|\alpha_0 \sim Beta(1, \alpha_0)\\
+&\pi_k = \left\{
+ \begin{array}{ll}
+ V_1 & k=1 \\
+ V_k\prod_{I=1}^{K-1}(1-V_I) & k > 1
+ \end{array}
+ \right.\\
+
+&Z_i|\pi \sim Cat(\pi), \pi \in \Delta^\infty \\
+
+&\tau_d \sim p(\tau_d), d=1, \dots, D_x+1 \\
+
+&\theta_{kj}|Z_{ik}, \tau, C_{ij}, W \sim p(\theta_{jk}|W, \tau), j=1, \dots, J\\
+
+&
+
+y_i|Z_{ik},\theta_{kj}, X_i, C_{ij} \sim p(y_i|Z_{ik}, C_{ij}, X_i, \theta_{kj}) \ni \mathbb{E}[y_i|Z_{ik},\theta_{kj}, X_i, C_{ij}]=\mu_i=g^{-1}(X_i^T\mathbb{\beta}_{kj}),\\
+& p(y_i|Z_{ik}, C_{ij}, X_i, \theta_{kj}) \text{from exponential family}\\
+
+&\text{Further assume that, for} \hspace{1mm}\theta=(\beta, \sigma)\\
+
+\hspace{5mm} &\tau_d | \mu_{\tau}, \Sigma_{\tau} \sim N_{D_W+1}(\mu_{\tau},\Sigma_{\tau}), \hspace{5mm} d=1, \dots, D_x+1\\
+
+\hspace{5mm} & \beta_{kj}|Z_{ik}, \tau, C_{ij},W \sim N_{D_X+1}([W_j^T\tau]^T, \Sigma_{\beta}), \hspace{5mm} j=1, \dots, J.
+
+\end{align}
+
+### (II) Estimation
+
+#### (A) **Algorithm 1:** Gibbs Sampler for hdpGLM with continuous outcome variable.\
+Set $Z^*$ as the unique values of $Z$, and $Z^{*C}$ as all the complement values of $Z^*$. We denote (i) $Z_j^*$ as the unique values of $Z$ in the context $j$; (ii) $Z_j^{*C}$ as the complement of $Z^*$ in $j$; (iii) $I_k$ as the set of indexes $i$ of the data points assigned to the cluster $k$; (iv) $N_k$ as the total number of data points in $k$ , and (v) $X_{jk}$ (or $y_{jk}$) as the covariates (outcome variable) of the observations $i$ in context $j$ and assigned to the cluster $k$.\
+
+ **Require:** $Z^{(t)}=(Z_1^{(t)}, \dots, Z_n^{(t)}), \theta_Z^{(t)}, \tau^{(t)}, \pi^{(t)}$\
+
+1. For $d \in \{1, \dots, D_x+1\}$, sample $\tau_d^{(t+1)}\mid \theta^{(t)}, \mathbf{W}\sim p(\theta_d^{(t)}\mid \mathbf{W},\tau_d^{(t)})p(\tau_d)$
+
+2. For $j=1, \dots, J$:
+ a. For all $k\in Z_j^*$, sample $\theta_{kj}^{(t+1)}\mid Z^{(t)}, \theta^{(t)}, \tau^{(t+1)}, \mathbf{X}, \mathbf{W}, C, y \sim p(\theta_{kj}\mid \tau^{(t+1)}, \mathbf{W})\prod_{i \in I_k} p(y_i \mid Z_{ik}^{(t)}, C_{ij}, X_i, \theta_{kj}^{(t)})$; \
+ b. For all $k\in Z_j^{*C}$, sample $\theta_{kj}^{(t+1)}\mid \tau^{(t+1)}, \mathbf{W} \sim p(\theta_{kj}\mid \tau^{(t+1)}, \mathbf{W})$
+3. For $i=1, \dots, n$, sample $Z_i^{(t+1)}| \theta^{(t+1)}, \pi^{(t)}, X_i, y \sim \sum_{k=1}^K p_{ik}\delta(Z_{ik}) \ni p_{ik} \propto \pi_k^{(t)}p(y_i\mid X_i, Z_{ik}^{(t)}, C_{ij}, \theta_{kj}^{(t+1)})$
+4. For $k=1, \dots, K-1$, sample $v_k^{(t+1)}\overset{iid}{\sim} Beta\Big(1+N_k^{(t+1)}, \alpha_0+\sum_{I=k+1}^KN_I^{(t+1)} \Big) \ni N_k^{(t+1)}=\sum_{i=1}^nI(Z_{ik}^{(t+1)})$\
+Set $v_K^{(t+1)}=1$ and compute $\pi_k^{(t+1)} = \left\{
+ \begin{array}{ll}
+ V_1^{(t+1)} & k=1 \\
+ V_k^{(t+1)}\prod_{I=1}^{K-1}(1-V_I)^{(t+1)} & k =2, \dots, K.
+ \end{array}
+ \right.$\
+
+#### (B) **Algorithm 2:** Riemman Manifold Hamiltonian Monte Carlo for hdpGLM with discrete outcome variable.
+The random variable of interest is $\beta_{kj} \in \mathbb{R}^{D_x+1}$ and we use $v \in \mathbb{R}^{D_x+1}$ as the ancillary variable (momentum) such that $v \sim N_{D_x+1}(0,G(\beta_{kj}))$. Denote (i) $l \in \{1, \dots, L\}$ as the leapfrog steps with size $\epsilon$; (ii) $H(\beta_{kj}, v)$ as the Hamiltonian; (iii) $U(\beta_{kj})$ as an element of the Hamiltonian model, which has gradient $\mathbb{\triangledown}_{\beta_{kj}}U(\beta_{kj})$ (For detailed definitions of each terms, please refer to https://doi.org/10.1017/pan.2019.13 .\
+
+**Requre:** $Z^{(t)}=(Z_1^{(t)}, \dots, Z_n^{(t)}), \beta^{(t)}, \tau^{(t)}, \pi^{(t)}$\
+
+1. **for** $j=1, \dots, J$ and $k\in Z_j^*$, **do**
+
+2. $\hspace{2mm}$ sample $v^{current} \sim N_{D_x+1}(0, G(\beta_{kj}))$
+
+3. $\hspace{2mm}$ Let $v \leftarrow v^{current}$, $\beta_{kj} \leftarrow \beta_{kj}^{(t)}$
+
+4. $\hspace{2mm}$ Set $v \leftarrow v-\frac{\epsilon}{2}\mathbb{\triangledown}_{\beta_{kj}}U(\beta_{kj})$
+
+5. $\hspace{2mm}$ **for** $l=L-1$ **do**
+
+6. $\hspace{4mm}$ $\beta_{kj} \leftarrow \beta_{kj}+\epsilon(G^{-1}(\beta_{kj})v)$
+
+7. $\hspace{4mm}$ $v \leftarrow v-\frac{\epsilon}{2}\mathbb{\triangledown}_{\beta_{kj}}U(\beta_{kj})$
+
+8. $\hspace{2mm}$ **end for**
+
+9. $\hspace{2mm}$ Set $v \leftarrow v-\frac{\epsilon}{2}\mathbb{\triangledown}_{\beta_{kj}}U(\beta_{kj})$
+
+10. $\hspace{2mm}$ Set $v=-v$
+
+11. $\hspace{2mm}$ sample $\mu \sim U(0,1)$
+
+12. $\hspace{2mm}$ **if** $\mu < min\Big\{1, exp\Big\{-H(\beta_{kj},v)+H(\beta_{kj}^{(t)},v^{current})\Big\}\Big\}$ **then**
+
+13. $\hspace{4mm}$ $\beta_{kj}^{(t+1)} \leftarrow \beta_{kj}$
+
+14. $\hspace{2mm}$**else**
+
+15. $\hspace{4mm}$ $\beta_{kj}^{(t+1)} \leftarrow \beta_{kj}^{(t)}$
+
+16. $\hspace{2mm}$ **end if**
+
+17. **end for**
+
+
+## Implementation
+The **hdpGLM** package provides a function hdpGLM() that takes relationship parameters (formula1
, formula2
), data source (data
), context indicator (context.id
), MCMC sampling parameters (mcmc,K, fix, epsilon,leapFrog, n.display, hmc_iter
), and the type of exponential distribution (family
). The hdpGLM estimation is conducted using MCMC Blocked Gibbs Sampler if the output variable is Gaussian distributed, as shown in Algorithm 1 in the previous session. And it uses MCMC Metropolis-Hastings inside Gibbs if the output variable is binomial or multinomial distributed as shown in Algorithm 2 in the previous session. The function returns a list of MCMC elements, including samples, pik, max_active, n.iter, burn.in, and time.elapsed. The samples element contains a MCMC object with the samples from the posterior distribution. And the pik is a n x K matrix with the estimated probabilities that the observation
+$i$ belongs to the cluster $k$.
+
+The basic structure of the function is:\
+```
+hdpGLM(formula1, formula2 = NULL, data, context.id = NULL, mcmc, K = 100, fix = NULL, family = "gaussian", epsilon = 0.01, leapFrog = 40, n.display = 1000, hmc_iter = 1)
+```
+
+formula1
specifies the symbolic relationship between the outcome variable and the set of individual-level covariates. formula2
specifies the symbolic relationship between the outcome variable and group-level covariates. The syntax for both formula1
& formula2
are the same as used in the lm function. data
receives a data.frame which contains all the variables utilized in formula1 and formula2. context.id
takes a vector that identifies the context of each observation. mcmc
takes a list of two integer values. One indicates the number of burn-in iterations of the MCMC, and the other indicates the number of iterations after burn in period for the MCMC. K
indicates the maximum number of clusters to truncate the Dirichlet
+Process Priors. fix
stores hyperparamters of the model. It should be either NULL or contain a vector named mu_beta, a squared matrix Sigma_beta, and single number alpha. if family ='gaussian', then it must also contains s2_sigma and df_sigma, both single numbers. If NULL, the defaults are mu_beta=0, Sigma_beta=diag(10), alpha=1, df_sigma=10, d2_sigma=10. family
can be ’gaussian’, ’binomial’, or ’multinomial'. epsilon
is a numeric used in the Stormer-Verlet Integrator (a.k.a leapfrog integrator) to solve the Hamiltonian Monte Carlo (HMC) when family='binomial' or family='multinomial'. leapFrog
specifies the number of steps taken at each iteration HMC for the Stormer-Verlet Integrator, and it is used when family='binomial' or family='multinomial'. n.display
indicates the iteration to display information about the estimation process. hmc_iter
indicates the number of HMC iterations for each Gibbs iteration and is only used when family='binomial' or family='multinomial'. The applicable default value for each input parameter is as listed.
+
+For instance, if one wants to investigate the effects of income and race on voters’ support for welfare policies in different countries. Suppose further that the effects vary by unobserved categories based on personal experiences with the class and racial conflict, and also vary by the degree of economic development and inequality of the country. Then one can use:
+
+```
+hdpGLM(support ~ income + race, support ~ development + inequality, data=data, mcmc=list(burn.in = 0, n.iter = 100))
+```
+The output of the function then tells us the estimated heterogeneous marginal effects of income and race on welfare policy support.
+
+
+## Example\
+
+### (I) Simulating Data
+
+To illustrate how to use the **hdpGLM** package, let's simulate an example data set called data. This dataset is derived from two data contexts, each of which contains $20$ observations, $2$ individual-level covariates, and $1$ group-level covariate. While the first data context holds $3$ heterogeneous groups, the second context holds $2$. The outcome variable y depends on the two individual-level covariates, group categories, data contexts, as well as unpredictable random shocks.
+```{r}
+library(magrittr)
+# Note: this example is just for illustration. MCMC iterations are very reduced
+set.seed(10)
+n = 20
+
+data.context1 = tibble::data_frame(
+x1 = rnorm(n, -3),
+x2 = rnorm(n, 3),
+z = sample(1:3, n, replace=TRUE),
+y =I(z==1) * (3 + 4*x1 - x2 + rnorm(n)) +
+I(z==2) * (3 + 2*x1 + x2 + rnorm(n)) +
+I(z==3) * (3 - 4*x1 - x2 + rnorm(n)) ,
+w = 20
+)
+
+data.context2 = tibble::data_frame(x1 = rnorm(n, -3),
+x2 = rnorm(n, 3),
+z = sample(1:2, n, replace=TRUE),
+y =I(z==1) * (1 + 3*x1 - 2*x2 + rnorm(n)) +
+I(z==2) * (1 - 2*x1 + x2 + rnorm(n)),
+w = 10
+)
+
+data = data.context1 %>%
+dplyr::bind_rows(data.context2)
+
+data$beta0=0
+data$beta1=0
+data$beta2=0
+data$beta0[which(data$w==20)]=3
+data$beta0[which(data$w==10)]=1
+data$beta1[which(data$w==20 & data$z==1)]=4
+data$beta1[which(data$w==20 & data$z==2)]=2
+data$beta1[which(data$w==20 & data$z==3)]=-4
+data$beta1[which(data$w==10 & data$z==1)]=3
+data$beta1[which(data$w==10 & data$z==2)]=-2
+
+data$beta2[which(data$w==20 & data$z==1)]=-1
+data$beta2[which(data$w==20 & data$z==2)]=1
+data$beta2[which(data$w==20 & data$z==3)]=-1
+data$beta2[which(data$w==10 & data$z==1)]=-2
+data$beta2[which(data$w==10 & data$z==2)]=1
+
+tau0<-lm(beta0~w, data=data)
+tau1<-lm(beta1~w, data=data)
+tau2<-lm(beta2~w, data=data)
+
+```
+
+
+
+
+### (II) Effect Estimation Using hdpGLM()
+Let’s specify the number of burn-in iterations to be $1$, and the number of iterations after the burn-in period to be $50$ for the MCMC parameter. The heterogeneous effects of x1 and x2 on y can be estimated using the hdpGLM() function provided by the **hdpGLM** package as followed.
+
+```{r}
+## estimation
+library(hdpGLM)
+mcmc = list(burn.in=1, n.iter=50)
+samples = hdpGLM(y ~ x1 + x2, y ~ w, data=data, mcmc=mcmc, n.display=0)
+```
+
+### (III) Visualizing the results
+The package provides a summary function that produces estimated heterogeneous marginal effects and their corresponding inference statistics. It also contains plot functions to help users visualize the effects of interest.
+
+```{r, fig.width=7.2, fig.height=5}
+true.beta=data.frame(j=c(1,1,1,1,1,1,1,1,1,2,2,2,2,2,2), k=c(1,1,1,2,2,2,3,3,3,1,1,1,2,2,2), Parameter=c("beta[0]", "beta[1]", "beta[2]", "beta[0]", "beta[1]", "beta[2]", "beta[0]", "beta[1]", "beta[2]", "beta[0]", "beta[1]", "beta[2]", "beta[0]", "beta[1]", "beta[2]"), True=c(3,4,-1,3,2,1,3,-4,-1,1,3,-2,1,-2,1), w=c(20,20,20,20,20,20,20,20,20,10,10,10,10,10,10))
+
+true.tau=data.frame(w=c(0,0,0,1,1,1), beta=c(0,1,2,0,1,2), Parameter=c("tau[0][0]", "tau[0][1]", "tau[0][2]", "tau[1][0]", "tau[1][1]", "tau[1][2]"), True=c(tau0$coefficients[1],tau1$coefficients[1],tau2$coefficients[1],tau0$coefficients[2],tau1$coefficients[2],tau2$coefficients[2]))
+
+summary(samples)
+
+
+plot_beta(samples, X =c ('x1','x2'), context.id = NULL, true.beta = true.beta,
+title = "posterior distribution of beta for each context", plot.mean = TRUE,
+plot.grid = TRUE, showKhat = TRUE, col = NULL,
+xlab.size = NULL, ylab.size = NULL, title.size = 2,
+legend.size = NULL, xtick.distance = NULL, left.margin = 0,
+ytick.distance = NULL, col.border = "white")
+
+plot_hdpglm(samples, ncol.taus=4, ncol.betas=4, tau.x.axis.size = .7,true.tau=true.tau,
+tau.y.axis.size = .7,beta.x.axis.size = .7, beta.y.axis.size = .7, tau.title.size = .5)
+```
+
+## Reference
+
diff --git a/vignettes/references.bib b/vignettes/references.bib
new file mode 100644
index 0000000..3e96d64
--- /dev/null
+++ b/vignettes/references.bib
@@ -0,0 +1,160 @@
+@article{mukhopadhyay1997dirichlet,
+ title={Dirichlet process mixed generalized linear models},
+ author={Mukhopadhyay, Saurabh and Gelfand, Alan E},
+ journal={journal of the American Statistical Association},
+ volume={92},
+ number={438},
+ pages={633--639},
+ year={1997},
+ publisher={Taylor \& Francis Group}
+}
+
+@article{kleinman1998semi,
+ title={A semi-parametric Bayesian approach to generalized linear mixed models},
+ author={Kleinman, Ken P and Ibrahim, Joseph G},
+ journal={Statistics in Medicine},
+ volume={17},
+ number={22},
+ pages={2579--2596},
+ year={1998},
+ publisher={Wiley Online Library}
+}
+
+@article{kleinman1998semiparametric,
+ title={A semiparametric Bayesian approach to the random effects model},
+ author={Kleinman, Ken P and Ibrahim, Joseph G},
+ journal={Biometrics},
+ pages={921--938},
+ year={1998},
+ publisher={JSTOR}
+}
+
+
+@article{dorazio2008modeling,
+ title={Modeling unobserved sources of heterogeneity in animal abundance using a Dirichlet process prior},
+ author={Dorazio, Robert M and Mukherjee, Bhramar and Zhang, Li and Ghosh, Malay and Jelks, Howard L and Jordan, Frank},
+ journal={Biometrics},
+ volume={64},
+ number={2},
+ pages={635--644},
+ year={2008},
+ publisher={Wiley Online Library}
+}
+
+@article{gill2009nonparametric,
+ title={Nonparametric priors for ordinal Bayesian social science models: Specification and estimation},
+ author={Gill, Jeff and Casella, George},
+ journal={Journal of the American Statistical Association},
+ volume={104},
+ number={486},
+ pages={453--454},
+ year={2009},
+ publisher={Taylor \& Francis}
+}
+
+@article{heinzl2013clustering,
+ title={Clustering in linear mixed models with approximate Dirichlet process mixtures using EM algorithm},
+ author={Heinzl, Felix and Tutz, Gerhard},
+ journal={Statistical Modelling},
+ volume={13},
+ number={1},
+ pages={41--67},
+ year={2013},
+ publisher={Sage Publications Sage India: New Delhi, India}
+}
+
+
+@article{stegmueller2013modeling,
+ title={Modeling dynamic preferences: A Bayesian robust dynamic latent ordered probit model},
+ author={Stegmueller, Daniel},
+ journal={Political Analysis},
+ pages={314--333},
+ year={2013},
+ publisher={JSTOR}
+}
+
+
+@article{traunmuller2015modeling,
+ title={Modeling latent information in voting data with dirichlet process priors},
+ author={Traunm{\"u}ller, Richard and Murr, Andreas and Gill, Jeff},
+ journal={Political Analysis},
+ pages={1--20},
+ year={2015},
+ publisher={JSTOR}
+}
+
+
+@article{ferrari2020modeling,
+ title={Modeling Context-Dependent Latent Effect Heterogeneity},
+ author={Ferrari, Diogo},
+ journal={Political Analysis},
+ volume={28},
+ number={1},
+ pages={20--46},
+ year={2020},
+ publisher={Cambridge University Press}
+}
+
+
+
+
+
+@article{mallick1997combining,
+ title={Combining information from several experiments with nonparametric priors},
+ author={Mallick, Bani K and Walker, Stephen G},
+ journal={Biometrika},
+ volume={84},
+ number={3},
+ pages={697--706},
+ year={1997},
+ publisher={Oxford University Press}
+}
+
+@article{carota2002semiparametric,
+ title={Semiparametric regression for count data},
+ author={Carota, Cinzia and Parmigiani, Giovanni},
+ journal={Biometrika},
+ volume={89},
+ number={2},
+ pages={265--281},
+ year={2002},
+ publisher={Oxford University Press}
+}
+
+@article{de2004anova,
+ title={An ANOVA model for dependent random measures},
+ author={De Iorio, Maria and M{\"u}ller, Peter and Rosner, Gary L and MacEachern, Steven N},
+ journal={Journal of the American Statistical Association},
+ volume={99},
+ number={465},
+ pages={205--215},
+ year={2004},
+ publisher={Taylor \& Francis}
+}
+
+@article{muller2004method,
+ title={A method for combining inference across related nonparametric Bayesian models},
+ author={M{\"u}ller, Peter and Quintana, Fernando and Rosner, Gary},
+ journal={Journal of the Royal Statistical Society: Series B (Statistical Methodology)},
+ volume={66},
+ number={3},
+ pages={735--749},
+ year={2004},
+ publisher={Wiley Online Library}
+}
+@inproceedings{teh2005sharing,
+ title={Sharing clusters among related groups: Hierarchical Dirichlet processes},
+ author={Teh, Yee W and Jordan, Michael I and Beal, Matthew J and Blei, David M},
+ booktitle={Advances in neural information processing systems},
+ pages={1385--1392},
+ year={2005}
+}
+
+@article{sethuraman1994constructive,
+ title={A constructive definition of Dirichlet priors},
+ author={Sethuraman, Jayaram},
+ journal={Statistica sinica},
+ pages={639--650},
+ year={1994},
+ publisher={JSTOR}
+}
\ No newline at end of file
diff --git a/vignettes/rsconnect/documents/Report.Rmd/rpubs.com/rpubs/Document.dcf b/vignettes/rsconnect/documents/Report.Rmd/rpubs.com/rpubs/Document.dcf
new file mode 100644
index 0000000..a3c890b
--- /dev/null
+++ b/vignettes/rsconnect/documents/Report.Rmd/rpubs.com/rpubs/Document.dcf
@@ -0,0 +1,10 @@
+name: Document
+title:
+username:
+account: rpubs
+server: rpubs.com
+hostUrl: rpubs.com
+appId: https://api.rpubs.com/api/v1/document/654472/12117752e7a24a1e8831f62ef86a789d
+bundleId: https://api.rpubs.com/api/v1/document/654472/12117752e7a24a1e8831f62ef86a789d
+url: http://rpubs.com/publish/claim/654472/59bd706d85bf49a6abcdeb678772e1a4
+when: 1598895679.25725
diff --git a/vignettes/rsconnect/documents/Report.Rmd/rpubs.com/rpubs/Publish Document.dcf b/vignettes/rsconnect/documents/Report.Rmd/rpubs.com/rpubs/Publish Document.dcf
new file mode 100644
index 0000000..605aff3
--- /dev/null
+++ b/vignettes/rsconnect/documents/Report.Rmd/rpubs.com/rpubs/Publish Document.dcf
@@ -0,0 +1,10 @@
+name: Publish Document
+title:
+username:
+account: rpubs
+server: rpubs.com
+hostUrl: rpubs.com
+appId: https://api.rpubs.com/api/v1/document/654472/12117752e7a24a1e8831f62ef86a789d
+bundleId: https://api.rpubs.com/api/v1/document/654472/12117752e7a24a1e8831f62ef86a789d
+url: http://rpubs.com/publish/claim/654472/59bd706d85bf49a6abcdeb678772e1a4
+when: 1598895782.60494