-
Notifications
You must be signed in to change notification settings - Fork 0
/
10. SEM.Rmd
176 lines (118 loc) · 6.79 KB
/
10. SEM.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
---
title: "MESA8668 - Structural Equation Modelling"
author: "Gulsah Gurkan"
output:
pdf_document:
fontsize: 12pt
margins: 1in
header-includes:
- \setlength\parindent{24pt}
- \usepackage{chngcntr}
- \counterwithin{figure}{section}
- \counterwithin{table}{section}
- \usepackage{setspace}
- \usepackage{array}
- \usepackage{graphicx}
- \usepackage{caption}
- \usepackage{fancyhdr}
- \pagestyle{fancy}
- \fancyhf{}
- \rhead{}
- \lhead{SEM R Lab}
- \rfoot{Page \thepage}
- \renewcommand{\headrulewidth}{0pt}
---
```{r, setup, include=FALSE}
knitr::opts_chunk$set(
echo=FALSE, message=FALSE, warning=FALSE, comment=NA,
fig.height=8, fig.width=8
)
options(max.print = 2000, tibble.print_max = 100)
```
```{r results='hide'}
#empty environment.
rm(list = ls())
# Set directory.
setwd("...")
```
## Overview of the dataset
For this analysis, we will be using the tutorial that is also available here:
http://lavaan.ugent.be/tutorial/sem.html.
The dataset is provided in lavaan package, called "PoliticalDemocracy". We will be able to use this dataset by calling it after loading the package in the current R session. PoliticalDemocracy dataset involves 75 observations and 11 variables.
```{r}
library(lavaan)
names(PoliticalDemocracy) # variable names.
```
y1: Expert ratings of the freedom of the press in 1960
y2: The freedom of political opposition in 1960
y3: The fairness of elections in 1960
y4: The effectiveness of the elected legislature in 1960
y5: Expert ratings of the freedom of the press in 1965
y6: The freedom of political opposition in 1965
y7: The fairness of elections in 1965
y8: The effectiveness of the elected legislature in 1965
x1: The gross national product (GNP) per capita in 1960
x2: The inanimate energy consumption per capita in 1960
x3: The percentage of the labor force in industry in 1960
\newpage
## Step 1. Model specification
The variables used in this study are _industrialization index in 1960 (ind60)_ which is measured by three observed variables (X1, X2, and X3); _political democracy index in 1960 (dem60)_ which is measured by four observed variables (Y1, Y2, Y3, and Y4); and _political democracy index in 1965 (dem65)_ which is measured by four observed variables (Y5, Y6, Y7, and Y8).
While taking the measurement error in the observed variables into account, it is aimed to investigate whether industrialization in 1960 is a significant predictor for political democracy in 1960. Additionally, it is also examined whether industrialization in 1960 and political democracy in 1960 together explain any variance in political democracy in 1965. The observed variables which are assumed to be correlated independent of the latent variables that they belong to are allowed to be correlated (seen under "residual correlations" below).
In total, there are 66 possible elements to be estimated $(k*[k+1]/2 = 11*[11+1]/2 = 66)$ where k is the total number of exogeneous (observed) variables. Latent variables are also called endogeneous variables in the SEM framework.
```{r}
model <- '
# measurement models
ind60 =~ x1 + x2 + x3
dem60 =~ y1 + y2 + y3 + y4
dem65 =~ y5 + y6 + y7 + y8
# regressions
dem60 ~ ind60
dem65 ~ ind60 + dem60
# residual correlations
y1 ~~ y5
y2 ~~ y4 + y6
y3 ~~ y7
y4 ~~ y8
y6 ~~ y8
'
```
\newpage
## Step 2. Model Estimation
```{r}
fit <- sem(model, data=PoliticalDemocracy)
# Path diagram for the model.
semPlot::semPaths(object = fit, what = "est", title = FALSE, layout="spring",
intercepts = FALSE,
bg = "lightskyblue", edge.color = "black", edge.label.cex = 0.75,
asize = 2, esize = 2,
color = list(lat = rgb(255, 131, 205, maxColorValue = 255),
man = rgb(155, 253, 175, maxColorValue = 255),
int = rgb(255, 255, 255, maxColorValue = 255)))
```
\newpage
## Step 3. Model interpretation
As mentioned before, there are $66$ parameters that can be estimated with the total number of variables in the model. The factor loading of the first observed variable in each factor is fixed to be 1 and not estimated based on the current model specification. Therefore, $11-3=8$ factor loadings being estimated. Moreover, $3$ regression coefficients are estimated based on the structural model. All observed/exogeneous variables and the endogeneous variables have residuals that are also being estimated, that is, $11+3=14$ parameters. There are $6$ residuals that are allowed to be correlated and so estimated by the specified model. In total, $8+3+14+6=31$ parameters being estimated and the degrees of freedom is $66-31=35$. These can be seen by calling parTable() function in lavaan.
```{r}
# Parameters.
par <- parTable(fit)
par[, c(1:4, 8:9, 12:15)]
```
### Model output
Setting the arguments "standardized=T" and "fit.measures=T" will help getting the standardized solution for the estimates as well as model-data fit statistics.
```{r}
# Summary of the results.
summary(fit, standardized=T, fit.measures = T)
```
#### Model fit indices
As mentioned in the CFA R lab, it is suggested to report the following indices; chi-square, RMSEA, CFI, SRMR.
(https://www.cscu.cornell.edu/news/Handouts/SEM_fit.pdf)
The Chi-square value is not significant ($X^2 = 38.125, df=35, p=0.329$) which suggests that the observed data is not significantly different than predicted by the model, that is, the model fit is good. RMSEA is $0.035$ and SRMR is $0.044$, and these are smaller than the suggested cut-off value of $0.08$. Finally, CFI is $0.995$ which is larger than the suggested cut-off value of $0.95$. Overall, the model fit indices suggest that the model fits the data well.
\newpage
#### Results for the research questions
__RQ1:__ Is there a significant relationship between industrialization in 1960 and political democracy in 1960?
The regression coefficient between ind60 and dem60 is $1.483$ and it is significant at the $0.05$ alpha level ($p<.001$). Therefore, we can conclude that ind60 is a significant predictor of dem60. Every one unit of increase in ind60 is predicted to increase dem60 by $1.483$.
__RQ2:__ Is there a significant relationship between industrialization in 1960 and political democracy in 1965?
The regression coefficient between ind60 and dem65 is $0.572$ and it is significant at the $0.05$ alpha level ($p<.05$). Therefore, we can conclude that ind60 is a significant predictor of dem65. Every one unit of increase in ind60 is predicted to increase dem65 by $0.572$.
__RQ3:__ Is there a significant relationship between political democracy in 1960 and political democracy in 1965?
The regression coefficient between dem60 and dem65 is $0.837$ and it is significant at the $0.05$ alpha level ($p<.001$). Therefore, we can conclude that dem60 is a significant predictor of dem65. Every one unit of increase in dem60 is predicted to increase dem65 by $0.837$.