-
Notifications
You must be signed in to change notification settings - Fork 2
/
psweight.sthlp.txt
546 lines (402 loc) · 27.5 KB
/
psweight.sthlp.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
___ ____ ____ ____ ____(R)
/__ / ____/ / ____/
___/ / /___/ / /___/
Statistics/Data Analysis
Title
psweight -- IPW- and CBPS-type propensity score reweighting, with various
extensions
Syntax
. psweight subcmd tvar tmvarlist [if] [in] [weight] [, stat penalty
variance options]
. psweight balanceonly tvar tmvarlist [if] [in] [weight] [, mweight(
varname) variance options]
. psweight call [v = ] [classfunction() | classvariable]
tvar must contain values 0 or 1, representing the treatment (1) and
comparison (0) groups.
tmvarlist specifies the variables that predict treatment assignment
in the treatment model.
classfunction() is a function of the psweight Mata class (the
functions may take arguments), classvariable is a member
variable, and v is a new Mata variable; see details below.
subcmd Description
-------------------------------------------------------------------------
ipw logit regression
cbps just-identified covariate-balancing propensity score
cbpsoid over-identified covariate-balancing propensity score
pcbps penalized covariate-balancing propensity score
mean_sd_sq minimize mean(stddiff())^2
sd_sq minimize sum(stddiff()^2)
stdprogdiff minimize sum(stdprogdiff()^2)
mean_sd synonym for mean_sd_sq
sd synonym for sd_sq
-------------------------------------------------------------------------
Only one subcmd may be specified.
stat Description
-------------------------------------------------------------------------
ate estimate average treatment effect in population; the
default
atet estimate average treatment effect on the treated
ateu estimate average treatment effect on the untreated
-------------------------------------------------------------------------
Only one stat may be specified.
penalty Description
-------------------------------------------------------------------------
cvtarget(# # #) applies penalty using the coefficient of variation of
the weight distribution; default is no penalty:
cvtarget(0 0 2)
skewtarget(# # #) applies penalty using the skewness of the weight
distribution; default no penalty: skewtarget (0 0
2)
kurttarget(# # #) applies penalty using the excess kurtosis of the
weight distribution; default no penalty:
kurttarget(0 0 2)
-------------------------------------------------------------------------
One or more penalty options may be specified.
variance Description
-------------------------------------------------------------------------
pooledvariance uses the pooled (treatment plus control) sample's
variances to calculate standardized differences;
the default
controlvariance uses the control group's variances to calculate
standardized differences
treatvariance uses the treatment group's variances to calculate
standardized difference
averagevariance uses (the control group's variances + treatment
group's variances)/2 to calculate standardized
differences
-------------------------------------------------------------------------
Only one variance may be specified.
options Description
-------------------------------------------------------------------------
depvarlist(varlist) outcome variables
display_options control columns and column formats, row spacing, line
width, display of omitted variables and base and
empty cells, and factor-variable labeling
maximize_options control the maximization process; seldom used
ntable display a table with sample sizes
coeflegend display legend instead of statistics
-------------------------------------------------------------------------
tmvarlist may contain factor variables; see fvvarlists.
Sample weights may be specified as fweights or iweights; see weight.
iweights are treated the same as fweights.
Description
psweight is a Stata command that offers Stata users easy access to the
psweight Mata class.
psweight subcmd computes inverse-probability weighting (IPW) weights for
average treatment effect, average treatment effect on the treated, and
average treatment effect on the untreated estimators for observational
data. IPW estimators use estimated probability weights to correct for
the missing data on the potential outcomes. Probabilities of
treatment--propensity scores--are computed for each observation with one
of a variety of methods, including logistic regression (traditional IPW),
covariate-balancing propensity scores (CBPS), penalized
covariate-balancing propensity scores (PCBPS), prognostic score-balancing
propensity scores, and other methods.
psweight balance constructs a balance table without computing IPW
weights. The most common use case is when you wish to construct a
balance table for the unweighted sample. However, you can also construct
a balance table with user-supplied weights.
After running psweight you can apply class functions to your data or
access results through psweight call; see details below.
Remarks
psweight subcmd constructs several variables: _pscore, _treated,
_weight_mtch, and _weight. If these variables exist before running
psweight, they will be replaced.
psweight subcmd solves for propensity score model coefficients,
propensity scores, and IPW weights as follows:
The first step involves computing coefficients for the propensity
score model, b. The propensity score model takes the form of a logit
regression model. Specifically, the propensity score for each row in
the data is defined as
p = invlogit(X * b')
where X is the vector of matching variables (tmvarlist) for the
respective row.
You specify a subcmd to control how the vector b is computed in the
internal numerical optimization problem. As discussed in Kranker,
Blue, and Vollmer Forrow (2019), we can set up optimization problems
to solve for the b that produces the best fit in the propensity score
model, the b that produces the best balance on matching variables,
the b that produces the best balance on prognostic scores, or
something else. The subcmd also determines how the term "best
balance" is defined in the previous sentence. That is, for a given
subcmd, we can generically define b as the vector that solves the
problem:
b = argmin L(X,T,W)
where L(X,T,W) is a "loss function" that corresponds to the specified
subcmd (e.g., logit regression or CBPS), given the data (X,T) and a
vector of weights (W). (The weights are computed using the
propensity scores, as we describe below. The propensity scores are
calculated using b, the data, and the formula given above.) The
available subcmds are listed below.
In Kranker, Blue, and Vollmer Forrow (2019), we proposed adding a
"penalty" to the loss function that lets you effectively prespecify
the variance (or higher-order moments) of the IPW weight
distribution. By constraining the distribution of the weights, you
can choose among alternative sets of matching weights, some of which
produce better balance and others of which yield higher statistical
power. The penalized method solves for b in:
b = argmin L(X,T,W) + f(W)
where f(W) is a smooth, flexible function that increases as the
vector of observation weights (W) becomes more variable. The penalty
options control the functional form of f(W); see details below.
Once the b is estimated, we can compute propensity scores (p) for
each observation with the formula given above and the observation's
matching variables (tmvarlist). The propensity scores are returned
in a variable named _pscore.
Once propensity scores are computed for each observation, we can
compute IPW "matching weights" for each observation. The formulas
for the IPW weights depend on whether you request weights for
estimating the average treatment effect (ate), the average treatment
effect on the treated (atet), or the average treatment effect on the
untreated (ateu). First we compute unnormalized weights as follows:
- The unnormalized ate weights are 1/p for treatment group
observations and 1/(1-p) for control group observations.
- The unnormalized atet weights are 1 for treatment group
observations and p/(1-p) for control group observations.
- The unnormalized ateu weights are (1-p)/p for treatment group
observations and 1 for control group observations.
Next, the weights are normalized to have mean equal to 1 in each
group, and returned in the variable named _weight_mtch.
Finally, the final weights (a variable named _weight) are set equal
to:
_weight = W :* _weight_mtch
where W are the sample weights. (The variable _weight equals
_weight_mtch if no sample weights are provided. If sample weights
are provided, the weights are normalized so the weighted mean equals
1 in each group.) For convenience, (a copy of) the treatment group
indicator variable is returned in a variable named _treated.
After estimation, psweight subcmd will display the model coefficients b
and a summary of the newly constructed variables.
Postestimation commands: psweight call
The psweight subcmd and psweight balance Stata commands are
"wrappers" around the psweight Mata class. When either command is
run, it constructs an instance of the class, and this instance
remains accessible to psweight call afterward.
Specifically, psweight call can be used to access the class functions
or member variables. A list of available functions (classfunction())
and member variables (classvariable) are available at: psweight class
For example, the following code would calculate traditional IPW
weights and then contruct a balance table for the reweighted sample:
. psweight ipw mbsmoke mmarried mage fbaby medu, treatvariance
. psweight call balanceresults()
You can also save the results of the function to a Mata variable, for
exmaple:
. psweight call mynewvar = stddiff()
Note that any default options that were overridden when psweight
subcmd was called will continue to be applied with psweight call. In
the example above, the balance table will use the treatment group's
variance to calculate standardized differences (rather than the
default variance). In general, I tried to use views rather than Mata
variables to store data; you could run into problems if you add,
drop, or sort your data before using psweight call.
After you finish using the instance of the class, it can be deleted
with:
. mata: mata drop psweight_ado_most_recent
Options
+--------+
----+ subcmd +-----------------------------------------------------------
The subcmd specifies which method is used to compute coefficients b for
the propensity score model. The seven available estimation methods are:
The ipw subcmd fits a logit regression model by maximum likelihood.
The logit regression solves for b in the model:
Prob(tvar = 1 | X) = invlogit(X * b)
The cbps subcmd computes just-identified covariate-balancing
propensity scores (Imai and Ratkovic 2014).
The cbpsoid subcmd computes over-identified covariate-balancing
propensity scores. (Imai and Ratkovic 2014).
The pcbps subcmd implements penalized covariate-balancing propensity
scores (Kranker, Blue, and Vollmer Forrow 2019). The pcbps subcmd
requires that at least one penalty option be specified. This is a
synonym for the cbps subcmd when one or more of the penalty options
is included.
The mean_sd_sq or mean_sd subcmds find the model coefficients that
minimize the quantity: mean(stddiff())^2. That is, the weights
minimize the mean standardized difference, squared.
The sd_sq or sd subcmds find the model coefficients that minimize the
quantity: sum(stddiff():^2). That is, the weights minimize the
squared (standardized) differences in means of tmvarlist between the
treatment and control groups. If you have more than one variable in
tmvarlist, the squared (standardized) differences in prognostic
scores are summed.
The stdprogdiff subcmd finds the model coefficients that minimize the
quantity: sum(stdprogdiff():^2). That is, the weights minimize the
squared (standardized) differences in mean prognostic scores between
the treatment and control groups. If you have more than one outcome
variable, the squared (standardized) differences in prognostic scores
are summed.
The stdprogdiff subcmd requires that dependent variables be
specified (through the depvarlist() option).
Prognostic scores are computed by fitting a linear regression
model of the tmvarlist on the dependent variable(s) by ordinary
least squares using only control group observations, and then
computing predicted values (for the whole sample). This method
of computing prognostic scores follows Hansen (2008) and Stuart,
Lee, and Leacy (2013).
+------+
----+ Stat +-------------------------------------------------------------
stat is one of three statistics: ate, atet, or ateu. ate is the default.
The stat dictates how the command uses propensity scores (p) to compute
IPW "matching weights" (the variable named _weight_mtch).
ate specifies that the average treatment effect be estimated.
atet specifies that the average treatment effect on the treated be
estimated.
ateu specifies that the average treatment effect on the untreated be
estimated.
The formulas used for computing IPW weights for each of these three stats
are described above.
+---------+
----+ Penalty +----------------------------------------------------------
The penalty options determine the function, f(W), that we use to modify
the loss function (L(X,T,W)). If none of these options are specified,
f(W)=0.
cvtarget(# # #) applies a penalty using the coefficient of variation
of the weight distribution. If cvopt(a, b, c) is specified, then
the loss function is modified as:
L'(X,T,W) = L(X,T,W) + a * abs((wgt_cv() - b)^c)
The default is no penalty: cvtarget(0 . .).
skewtarget(# # #) applies a penalty using the skewness of the weight
distribution. If skewtarget(d, e, f) is specified, then the loss
function is modified as:
L'(X,T,W) = L(X,T,W) + d * abs((wgt_skewness() - e)^f)
The default is no penalty: skewtarget(0 . .).
kurttarget(# # #) applies a penalty using the excess kurtosis of the
weight distribution. If kurttarget(g, h, i) is specified, then
the loss function is modified as:
L'(X,T,W) = L(X,T,W) + g* abs((wgt_kurtosis() - h)^i)
The default is no penalty: kurttarget(0 . .).
+----------+
----+ Variance +---------------------------------------------------------
variance is one of three statistics: pooledvariance, controlvariance,
treatvariance, or averagevariance. pooledvariance is the default. The
variance dictates how the command standardizes the difference in means
between the treatment and control groups. Standardized differences are
used to compute the loss function for some subcmds and for computing
balance tables.
pooledvariance uses the pooled (treatment plus control) sample's
variances to calculate standardized differences; the default
controlvariance uses the control group's variances to calculate
standardized differences
treatvariance uses the treatment group's variances to calculate
standardized difference
averagevariance uses (the control group's variances + treatment
group's variances)/2 to calculate standardized differences (as in
tebalance)
+---------------+
----+ Other options +----------------------------------------------------
depvarlist(varlist) specificies dependent variables (outcome variables).
The data for the treatment group observations are ignored, but the
data for the the control group are used to compute prognostic scores
(see above).
display_options:
The following options control the display of the coefficient tables:
noomitted, vsquish, noemptycells, baselevels, allbaselevels,
nofvlabel, fvwrap(#), fvwrapon(style), cformat(%fmt), and
nolstretch see [R] estimation options.
The following options control the display of the balance tables:
formats(%fmt), noomitted, vsquish, noemptycells, baselevels,
allbaselevels, nofvlabel, fvwrap(#), fvwrapon(style), and
nolstretch; see _matrix_table.
maximize_options control the maximization process: from(init_specs),
trace, gradient, hessian, showstep, technique(algorithm_specs),
iterate(#), tolerance(#), ltolerance(#), nrtolerance(#),
qtolerance(#), nonrtolerance, showtolerance, and difficult. For a
description of these options, see [R] maximize and
moptimize_init_mlopts().
ntable displays a table with sample sizes and the sum of the weights.
coeflegend; [R] estimation options.
+------------------+
----+ psweight balance +-------------------------------------------------
mweight(varname) is a variable containing user-specified "matching"
weights. This variable is the analogue to the _weight_mtch variable;
it should not be multiplied with the sample weights.
If mweight(varname) is not specified, the balance table is
constructed with "unweighted" data. (Only the sample weights are
applied.)
As explained in the syntax, psweight balance also allows many of the
options listed above.
Examples
Setup
. webuse cattaneo2
Balance before reweighting
. psweight balance mbsmoke mmarried mage fbaby medu
Estimate the average treatment effect of smoking on birthweight, using a
logit model to predict treatment status
. psweight ipw mbsmoke mmarried mage fbaby medu
. psweight call balanceresults()
Estimate the average treatment effect on the treated with CBPS
. psweight cbps mbsmoke mmarried mage fbaby medu, atet
. psweight call balanceresults()
Estimate the average treatment effect on the treated with Penalized CBPS
. psweight pcbps mbsmoke mmarried mage fbaby medu, atet cvtarget(1 .5
6)
. psweight call balanceresults()
For more examples, see psweight_example.do
Author
By Keith Kranker
Mathematica
Suggested Citation:
Kranker, K. "psweight: IPW- and CBPS-type propensity score
reweighting, with various extensions," Statistical Software
Components S458657, Boston College Department of Economics, 2019.
Available at https://ideas.repec.org/c/boc/bocode/s458657.html.
- or -
Kranker, K., L. Blue, and L. Vollmer Forrow. "Improving Effect
Estimates by Limiting the Variability in Inverse Propensity Score
Weights." Manuscript under review, 2019.
Acknowledgements
My coauthors, Laura Blue and Lauren Vollmer Forrow, were closely involved
with the developement of the Penalized CBPS methodology. We received
many helpful suggestions from our colleages at Mathematica, especially
those on the Comprehensive Primary Care Plus Evaluation team. Of note, I
thank Liz Potamites for testing early versions of the program and
providing helpful feedback.
The code for implementing the CBPS method is based on work by Fong et al.
(2018), namely the CBPS package for R. I also reviewed the Stata CBPS
implementation by Filip Premik.
Source code is available at https://github.com/kkranker/psweight. Please
report issues at https://github.com/kkranker/psweight/issues.
Stored results
psweight subcmd and psweight balance store the following in e():
Scalars
e(N) number of observations
Macros
e(cmd) psweight
e(cmdline) command as typed
e(properties) b
e(subcmd) specified subcmd
e(tvar) name of the treatment indicator (tvar)
e(tmvarlist) names of the matching variables (tmvarlist)
e(depvarlist) names of dependent variables (if any)
e(variance) specified variance
e(wtype) weight type (if any)
e(wexp) weight expression (if any)
e(cvopt) specified penalty (if any)
In addition, psweight subcmd stores the following in e():
Macros
e(stat) specified stat
Matrices
e(b) coefficient vector
Functions
e(sample) marks estimation sample
In addition, psweight balance stores a matrix named r(bal) and other
output in r(); see the documentation for the balancetable().
The ntable option adds a matrix stores a matrix named r(N_table) and
other output in r(); see the documentation for the get_N() command.
References
Fong, C., M. Ratkovic, K. Imai, C. Hazlett, X. Yang, and S. Peng. 2018.
CBPS: Covariate Balancing Propensity Score, Package for the R
programming langauage, The Comprehensive R Archive Network.
Available at: https://CRAN.R-project.org/package=CBPS
Hansen, B. B. 2008. "The Prognostic Analogue of the Propensity Score."
Biometrika, 95(2): 481–488, doi:10.1093/biomet/asn004.
Imai, K. and M. Ratkovic. 2014. "Covariate Balancing Propensity Score."
Journal of the Royal Statistical Society: Series B (Statistical
Methodology), 76(1): 243–263, doi:10.1111/rssb.12027.
Kranker, K., L. Blue, and L. Vollmer Forrow. 2019. "Improving Effect
Estimates by Limiting the Variability in Inverse Propensity
Score Weights." Manuscript under review.
Stuart, E. A., B. K. Lee, and F. P. Leacy. 2013. "Prognostic
Score–based Balance Measures Can Be a Useful Diagnostic for
Propensity Score Methods in Comparative Effectiveness Research."
Journal of Clinical Epidemiology, 66(8): S84–S90.e1,