-
Notifications
You must be signed in to change notification settings - Fork 2
/
Copy pathREADME.Rmd
178 lines (132 loc) · 6.97 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
# ERPM
<!-- badges: start -->
<!-- badges: end -->
ERPM is the software implementation of the statistical model outlined in:
Hoffman, M., Block, P., & Snijders, T. A. (2023). Modeling partitions of individuals. Sociological Methodology, 53(1), 1-41.
[Link to the study in Sociological Methodology](https://doi.org/10.1177/00811750221145166).
The model allows for the analysis of emergent group compositions in partitions, i.e., sets of non-overlapping groups. For a given partition of individuals (or nodes, to follow the language of network analysis), we can use this model to understand group formation processes that led to the observation of this partition, based on individual attributes, relations between individuals, and size-related factors. It can be seen as an extension of Exponential Random Graph Models (ERGMs) to the case of partition objects. With this package, one can either simulate this model or estimate its parameters for a given dataset.
This package also provides a longitudinal extension of the model, for a list of partitions, where each partition depends on the previous partitions. It follows the definition proposed in:
Hoffman, M., & Chabot, T. (2023). The role of selection in socioeconomic homophily: Evidence from an adolescent summer camp. Social Networks, 74, 259-274.
The **ERPM Manual** is available here on github in the documentation folder.
# Note from the developers
The package and the documentation might still have bugs or errors, or you might not be able to do what you want.
In that case, or if you are unsure please create an issue here or directly send an email to the package maintainer (marion.hoffman[at]iast.fr).
We are currently (Mar 2024) on version 0.1.0 on github.
# Installation
You can install ERPM either from GitHub or from CRAN.
``` r
# from GitHub:
# install.packages("remotes")
remotes::install_github("stocnet/ERPM")
```
``` r
# from CRAN:
install.packages("ERPM")
```
# Cross-sectional example
In this section, we outline a simple example with synthetic data.
```{r}
library(ERPM)
```
## The Data
Let us define a set of n = 6 nodes with three attributes (label, gender, and age), and an arbitrary covariate matrix (friendship).
We create the following dataframe.
```{r}
n <- 6
nodes <- data.frame(label = c("A","B","C","D","E","F"),
gender = c(1,1,2,1,2,2),
age = c(20,22,25,30,30,31))
friendship <- matrix(c(0, 1, 1, 1, 0, 0,
1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 1, 0,
1, 0, 0, 0, 0, 0,
0, 1, 1, 0, 0, 1,
0, 0, 0, 0, 1, 0), 6, 6, TRUE)
```
We consider a partition for these 6 individuals. We define a vector with six elements, indicating the id of each individual's group.
```{r}
partition <- c(1,1,2,2,2,3)
```
## Model specification
First, we need to choose the effects (i.e., explaining variables) we want to include. For example we set four (which is of course not reasonable for 6 nodes):
1. "num_groups": tendency to form more groups
2. "same" (for the gender covariate): tendency to form groups with individuals with the same gender
3. "diff" (for the age covariate): tendency to form groups with individuals with high age differences
4. "tie" (for the friendship covariate): tendency to form groups with friends
```{r}
effects <- list(names = c("num_groups","same","diff","tie"),
objects = c("partition","gender","age","friendship"))
objects <- list()
objects[[1]] <- list(name = "friendship", object = friendship)
```
The effect objects should contain names of pre-written functions in the package (see manual for all effect names) as well as objects they are referring to (either the partition, or covariates). When objects are not individual covariates, we need to create an additional list to store these extra objects.
## Estimation
The parameters of the model can be estimated using Maximum-likelihood estimation (equivalent to the Method of Moment estimation). For more details on the parametrization of the estimation algorithm, see the manual.
```{r}
estimation <- estimate_ERPM(partition,
nodes,
objects,
effects,
startingestimates = c(-1.5,0.2,-0.2,0.2),
burnin = 100,
thining = 20,
length.p1 = 500, # number of samples in phase 1
multiplication.iter.p2 = 20, # multiplication factor for the number of iteration in phase 2 subphases
num.steps.p2 = 4, # number of phase 2 subphases
length.p3 = 1000) # number of samples in phase 3
estimation$results
```
### Simulation
We can check how the model reproduces statistics of the observed data by simulating the estimated model. We can also use this function to simulate theoretical models.
```{r}
nsimulations <- 1000
simulations <- draw_Metropolis_single(theta = estimation$results$est,
first.partition = partition,
nodes = nodes,
effects = effects,
objects = objects,
burnin = 100,
thining = 20,
num.steps = nsimulations,
neighborhood = c(1,1,1),
sizes.allowed = 1:n,
sizes.simulated = 1:n,
return.all.partitions = T)
```
### Log-likelihood and AIC
Finally, we can estimate the log-likelihood and AIC of the model (useful to compare two models for example). First we need to estimate the ML estimates of a simple model with only one parameter for number of groups (this parameter should be in the model!).
```{r}
likelihood_function <- function(x){ exp(x*max(partition)) / compute_numgroups_denominator(n,x)}
curve(likelihood_function, from=-2, to=0)
parameter_base <- optimize(likelihood_function, interval=c(-2, 0), maximum=TRUE)
parameters_basemodel <- c(parameter_base$maximum,0,0,0)
```
Then we can get our estimated logL and AIC.
```{r}
logL_AIC <- estimate_logL(partition,
nodes,
effects,
objects,
theta = estimation$results$est,
theta_0 = parameters_basemodel,
M = 3,
num.steps = 200,
burnin = 100,
thining = 20)
logL_AIC$logL
logL_AIC$AIC
```
# More ...
For more details on the longitudinal version of the model or other functions, have a look at the manual in the documentation folder or the example script in the scripts folder.