-
Notifications
You must be signed in to change notification settings - Fork 0
/
README.Rmd
506 lines (416 loc) · 18 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
```{r setup,include=FALSE}
# set the knitr options ... for everyone!
# if you unset this, then vignette build bonks. oh, joy.
#opts_knit$set(progress=TRUE)
opts_knit$set(eval.after='fig.cap')
# for a package vignette, you do want to echo.
# opts_chunk$set(echo=FALSE,warning=FALSE,message=FALSE)
opts_chunk$set(warning=FALSE,message=FALSE)
#opts_chunk$set(results="asis")
opts_chunk$set(cache=TRUE,cache.path="cache/")
#opts_chunk$set(fig.path="github_extra/figure/",dev=c("pdf","cairo_ps"))
#opts_chunk$set(fig.path="github_extra/figure/",dev=c("png","pdf"))
#opts_chunk$set(fig.path="github_extra/figure/",dev=c("png"))
opts_chunk$set(fig.path="man/figures/",dev=c("png"))
opts_chunk$set(fig.width=6,fig.height=5,dpi=256)
# doing this means that png files are made of figures;
# the savings is small, and it looks like shit:
#opts_chunk$set(fig.path="figure/",dev=c("png","pdf","cairo_ps"))
#opts_chunk$set(fig.width=4,fig.height=4)
# for figures? this is sweave-specific?
#opts_knit$set(eps=TRUE)
# this would be for figures:
#opts_chunk$set(out.width='.8\\textwidth')
# for text wrapping:
options(width=64,digits=2)
opts_chunk$set(size="small")
opts_chunk$set(tidy=TRUE,tidy.opts=list(width.cutoff=50,keep.blank.line=TRUE))
#madness.meta <- packageDescription('madness')
```
# madness
[![Build Status](https://github.com/shabbychef/madness/workflows/R-CMD-check/badge.svg)](https://github.com/shabbychef/madness/actions)
[![codecov.io](http://codecov.io/github/shabbychef/madness/coverage.svg?branch=master)](http://codecov.io/github/shabbychef/madness?branch=master)
[![CRAN](http://www.r-pkg.org/badges/version/madness)](https://cran.r-project.org/package=madness)
[![Downloads](http://cranlogs.r-pkg.org/badges/madness?color=brightgreen)](http://www.r-pkg.org/pkg/madness)
[![Total](http://cranlogs.r-pkg.org/badges/grand-total/madness?color=green)](http://www.r-pkg.org/pkg/madness)
[![Rdoc](http://www.rdocumentation.org/badges/version/madness)](http://www.rdocumentation.org/packages/madness)
[![License: LGPL v3](https://img.shields.io/badge/License-LGPL%20v3-blue.svg)](https://www.gnu.org/licenses/lgpl-3.0)
'madness' is a 'Multivariate Automatic Differentiation' package for R. It allows one to compute and track the derivative of
multivariate to multivariate functions applied to concrete data via forward differentiation and the chain rule.
The obvious use cases are for computing approximate standard errors via the Delta method, possibly for optimization
of objective functions over vectors of parameters, and party tricks.
-- Steven E. Pav, shabbychef@gmail.com
## Installation
This package can be installed from CRAN,
via [drat](https://github.com/eddelbuettel/drat "drat"), or
from [github](https://www.github.com/shabbychef/madness "madness"):
```{r install,eval=FALSE,echo=TRUE}
# via CRAN:
install.packages('madness')
# via drat:
if (require(drat)) {
drat:::add('shabbychef')
install.packages('madness')
}
# via devtools (typically 'master' is stable):
if (require(devtools)) {
install_github('shabbychef/madness')
}
```
# Basic Usage
You can store an initial value in a `madness` object,
its derivative with respect to the independent variable, along with the
'name' of the dependent and independent variables, and an optional
variance-covariance matrix of the independent variable. These are mostly
filled in with sane defaults (the derivative defaults to the identity
matrix, and so on):
```{r tryit,cache=FALSE,eval=TRUE,echo=TRUE}
require(madness)
set.seed(1234)
x <- matrix(rnorm(16),ncol=2)
madx <- madness(x,vtag='y',xtag='x')
# the show method could use some work
print(madx)
```
You can then perform operations on objects and the derivative will be propagated
forward via the chain rule:
```{r tryit2,cache=FALSE,eval=TRUE,echo=TRUE}
madx2 <- cbind(crossprod(2 * madx),diag(2))
print(madx2)
madx3 <- colSums(madx2)
print(madx3)
madx4 <- norm(log(abs(1 + madx2^2)),'F')
print(madx4)
```
## Variance-Covariance
You can optionally attach the variance-covariance matrix of the 'X' variable to a
`madness` object. Then the estimated variance-covariance of computed quantities
can be retrieved via the `vcov` method:
```{r tryit3,cache=FALSE,eval=TRUE,echo=TRUE}
set.seed(456)
# create some fake data:
nobs <- 1000
adf <- data.frame(x=rnorm(nobs),y=runif(nobs),eps=rnorm(nobs))
adf$z <- 2 * adf$x - 3 * adf$y + 0.5 * adf$eps
# perform linear regression on it
lmod <- lm(z ~ x + y,data=adf)
# guess what? you can stuff it into a madness directly
amad <- as.madness(lmod)
print(vcov(amad))
# now, say, take the norm
mynorm <- sqrt(crossprod(amad))
print(mynorm)
print(vcov(mynorm))
```
-----------------
![Delta Method All the Things!](tools/figure/GevbfLD.jpg)
# Markowitz portfolio
There are two utilities for easily computing `madness` objects representing the first two moments of an object.
Here we use these to compute the Markowitz portfolio of some assets, along with the estimated variance-covariance
of the same. Here I first download the monthly simple returns of the Fama French three factors. (Note that you cannot
directly invest in these, except arguably 'the market'. The point of the example is to use realistic returns, not provide
investing advice.)
```{r quandl,cache=TRUE,eval=TRUE,echo=TRUE}
# Fama French Data are no longer on Quandl, use my homebrew package instead:
if (!require(aqfb.data) && require(devtools)) {
devtools::install_github('shabbychef/aqfb_data')
}
require(aqfb.data)
data(mff4)
ffrets <- 1e-2 * as.matrix(mff4[,c('Mkt','SMB','HML')])
```
Now compute the first two moments via `twomoments` and compute the Markowitz portfolio
```{r markowitz,cache=TRUE,eval=TRUE,echo=TRUE}
# compute first two moments
twom <- twomoments(ffrets)
markowitz <- solve(twom$Sigma,twom$mu)
print(val(markowitz))
print(vcov(markowitz))
wald <- val(markowitz) / sqrt(diag(vcov(markowitz)))
print(wald)
```
## Does that really work?
I don't know. Let's perform a bunch of simulations to see if the Wald statistics are OK. We will
create a population with 5 stocks where the true Markowitz portfolio is -2,-1,0,1,2. We will
perform 1000 simulations of 1250 days of returns from that population, computing the Markowitz
portfolio each time. Then take the difference between the estimated and true Markowitz portfolios,
divided by the standard errors.
```{r marksym,cache=TRUE,eval=TRUE,echo=TRUE}
genrows <- function(nsim,mu,haSg) {
p <- length(mu)
X <- matrix(rnorm(nsim*p),nrow=nsim,ncol=p) %*% haSg
X <- t(rep(mu,nsim) + t(X))
}
true.mp <- array(seq(-2,2))
dim(true.mp) <- c(length(true.mp),1)
p <- length(true.mp)
set.seed(92385)
true.Sigma <- 1.7e-4 * (3/(p+5)) * crossprod(matrix(runif((p+5)*p,min=-1,max=1),ncol=p))
true.mu <- true.Sigma %*% true.mp
haSigma <- chol(true.Sigma)
nsim <- 1000
ndays <- 1250
set.seed(23891)
retv <- replicate(nsim,{
X <- genrows(ndays,true.mu,haSigma)
twom <- twomoments(X)
markowitz <- solve(twom$Sigma,twom$mu)
marginal.wald <- (val(markowitz) - true.mp) / sqrt(diag(vcov(markowitz)))
})
retv <- aperm(retv,c(1,3,2))
```
This should be approximately normal, so we Q-Q plot against
normality. LGTM.
```{r marksym-check,cache=TRUE,eval=TRUE,echo=TRUE,dpi=200,out.width='600px',out.height='500px'}
require(ggplot2)
ph <- qplot(sample=retv[1,,1],stat='qq') + geom_abline(intercept=0,slope=1,colour='red')
print(ph)
```
# Trace of the covariance matrix
Consider the case of an 8-vector drawn from some population.
One observes some fixed number of independent observations of the vector,
computes the sample covariance matrix, then computes its trace. Using a
`madness` object, one can automagically estimate the standard error via the delta method.
The sample calculation looks as follows:
```{r matracer,cache=TRUE,eval=TRUE,echo=TRUE,dpi=200,out.width='600px',out.height='500px'}
genrows <- function(nsim,mu,haSg) {
p <- length(mu)
X <- matrix(rnorm(nsim*p),nrow=nsim,ncol=p) %*% haSg
X <- t(rep(mu,nsim) + t(X))
}
p <- 8
set.seed(8644)
true.mu <- array(rnorm(p),dim=c(p,1))
true.Sigma <- diag(runif(p,min=1,max=20))
true.trace <- matrix.trace(true.Sigma)
haSigma <- chol(true.Sigma)
ndays <- 1250
X <- genrows(ndays,true.mu,haSigma)
twom <- twomoments(X)
matt <- matrix.trace(twom$Sigma)
#Now perform some simulations to see if these are accurate:
nsim <- 1000
set.seed(23401)
retv <- replicate(nsim,{
X <- genrows(ndays,true.mu,haSigma)
twom <- twomoments(X,df=1)
matt <- matrix.trace(twom$Sigma)
marginal.wald <- (val(matt) - true.trace) / sqrt(diag(vcov(matt)))
})
require(ggplot2)
ph <- qplot(sample=as.numeric(retv),stat='qq') + geom_abline(intercept=0,slope=1,colour='red')
print(ph)
```
# Maximum eigenvalue of the covariance matrix
Consider the case of a 10-vector drawn from a population with covariance matrix whose largest eigenvalue
is, say, 17.
One observes some fixed number of independent observations of the 10-vector, and
computes the sample covariance matrix, then computes the maximum eigenvalue. Using a
`madness` object, one can automagically estimate the standard error via the delta method.
The sample calculation looks as follows:
```{r cosym_simple,cache=TRUE,eval=TRUE,echo=TRUE}
genrows <- function(nsim,mu,haSg) {
p <- length(mu)
X <- matrix(rnorm(nsim*p),nrow=nsim,ncol=p) %*% haSg
X <- t(rep(mu,nsim) + t(X))
}
p <- 10
set.seed(123950)
true.mu <- array(rnorm(p),dim=c(p,1))
true.maxe <- 17
true.Sigma <- diag(c(true.maxe,runif(p-1,min=1,max=16)))
haSigma <- chol(true.Sigma)
ndays <- 1250
X <- genrows(ndays,true.mu,haSigma)
twom <- twomoments(X,df=1)
maxe <- maxeig(twom$Sigma)
print(maxe)
print(vcov(maxe))
```
Now perform some simulations to see if these are accurate:
```{r cosym-mo,cache=TRUE,eval=TRUE,echo=TRUE,dpi=200,out.width='600px',out.height='500px'}
nsim <- 1000
set.seed(23401)
retv <- replicate(nsim,{
X <- genrows(ndays,true.mu,haSigma)
twom <- twomoments(X,df=1)
maxe <- maxeig(twom$Sigma)
marginal.wald <- (val(maxe) - true.maxe) / sqrt(diag(vcov(maxe)))
})
require(ggplot2)
ph <- qplot(sample=as.numeric(retv),stat='qq') + geom_abline(intercept=0,slope=1,colour='red')
print(ph)
```
Why the bias? There are a number of possibilities:
1. Broken example. Am I really checking what I intended?
2. Broken derivative code. Easy to check.
3. Failure to take symmetry into account.
4. Sample size not large enough. (ha ha.)
5. Derivative equal or near zero, requiring second term expansion in delta method.
# Enough already, bring me some Scotch!
An example is in order. Consider the tasting data compiled by Nessie on 86 Scotch whiskies. The data
are availble online and look like so:
```{r loadscotch,cache=TRUE,eval=TRUE,echo=TRUE}
library(curl)
wsky <- read.csv(curl('https://www.mathstat.strath.ac.uk/outreach/nessie/datasets/whiskies.txt'),
stringsAsFactors=FALSE)
kable(head(wsky))
```
A bizarre question one could ask of this data are whether the taste characteristics are related
to the geographic coordinates of the distilleries? One way to pose this is to perform a linear
regression of the taste values on the geographic coordinates. This is a many-to-many regression.
The Multivariate General Linear Hypothesis is a general hypothesis about the regression coefficients
in this case. The 'usual' application is the omnibus test of whether all regression coefficients
are zero. The MGLH is classically approached by four different tests, which typically give the same
answer.
First, we grab the geographic and taste data, prepend a one to the vector, take an outer product
and compute the mean and covariance. The MGLH statistics can be posed in terms of the eigenvalues
of a certain matrix. Here these are statistics are computed so as to equal zero under the null
hypothesis of all zero linear regression coefficient from geography to taste. We get the approximate
standard errors from the delta method, and compute Wald statistics.
```{r mglh_setup,eval=TRUE,echo=TRUE}
x <- cbind(1,1e-5 * wsky[,c('Latitude','Longitude')])
y <- wsky[,c('Body','Sweetness','Smoky','Medicinal','Tobacco','Honey','Spicy','Winey','Nutty','Malty','Fruity','Floral')]
xy <- cbind(x,y)
# estimate second moment matrix:
xytheta <- theta(xy)
nfeat <- ncol(x)
ntgt <- ncol(y)
GammaB <- xytheta[1:nfeat,nfeat+(1:ntgt)]
SiB <- - solve(xytheta)[nfeat+(1:ntgt),1:nfeat]
EH <- SiB %*% GammaB
oEH <- diag(ntgt) + EH
HLT <- matrix.trace(oEH) - ntgt
PBT <- ntgt - matrix.trace(solve(oEH))
# this is not the LRT, but log of 1/LRT. sue me.
LRT <- log(det(oEH))
RLR <- maxeig(oEH) - 1
MGLH <- c(HLT,PBT,LRT,RLR)
walds <- val(MGLH) / sqrt(diag(vcov(MGLH)))
# put them together to show them:
preso <- data.frame(type=c('HLT','PBT','LRT','RLR'),
stat=as.numeric(MGLH),
Wald.stat=as.numeric(walds))
kable(preso)
```
These all cast doubt on the hypothesis of 'no connection between geography and taste', although
I am accustomed to seeing Wald statistics being nearly equivalent. This is still beta code.
We also have the estimated standard error covariance of the vector of
MGLH statistics, turned into a correlation matrix here. The four test methods have positively
correlated standard errors, meaning we should not be more confident if all four suggest
rejecting the null.
```{r mglh_mo,eval=TRUE,echo=TRUE,cache=FALSE}
print(cov2cor(vcov(MGLH)))
```
# Correctness
The following are cribbed from the unit tests (of which there are never enough). First
we define the function which computes approximate derivatives numerically, then test
it on some functions:
```{r correctness,cache=FALSE,eval=TRUE,echo=TRUE}
require(madness)
require(testthat)
apx_deriv <- function(xval,thefun,eps=1e-9,type=c('forward','central')) {
type <- match.arg(type)
yval <- thefun(xval)
dapx <- matrix(0,length(yval),length(xval))
for (iii in seq_len(length(xval))) {
xalt <- xval
xalt[iii] <- xalt[iii] + eps
yplus <- thefun(xalt)
dydx <- switch(type,
forward={ (yplus - yval) / eps },
central={
xalt <- xval
xalt[iii] <- xalt[iii] - eps
yneg <- thefun(xalt)
(yplus - yneg) / (2*eps)
})
dapx[,iii] <- as.numeric(dydx)
}
dapx
}
test_harness <- function(xval,thefun,scalfun=thefun,eps=1e-7) {
xobj <- madness(val=xval,vtag='x',xtag='x')
yobj <- thefun(xobj)
xval <- val(xobj)
dapx <- apx_deriv(xval,scalfun,eps=eps,type='central')
# compute error:
dcmp <- dvdx(yobj)
dim(dcmp) <- dim(dapx)
merror <- abs(dapx - dcmp)
rerror <- merror / (0.5 * pmax(sqrt(eps),(abs(dapx) + abs(dcmp))))
rerror[dapx == 0 & dcmp == 0] <- 0
max(abs(rerror))
}
# now test a bunch:
set.seed(2015)
xval <- matrix(1 + runif(4*4),nrow=4)
yval <- matrix(1 + runif(length(xval)),nrow=nrow(xval))
expect_lt(test_harness(xval,function(x) { x + x }),1e-6)
expect_lt(test_harness(xval,function(x) { x * yval }),1e-6)
expect_lt(test_harness(xval,function(x) { yval / x }),1e-6)
expect_lt(test_harness(xval,function(x) { x ^ x }),1e-6)
expect_lt(test_harness(xval,function(x) { x %*% x }),1e-6)
expect_lt(test_harness(xval,function(x) { x %*% yval }),1e-6)
expect_lt(test_harness(abs(xval),function(x) { log(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { exp(x) }),1e-6)
expect_lt(test_harness(abs(xval),function(x) { log10(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { sqrt(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { matrix.trace(x) },function(matx) { sum(diag(matx)) }),1e-6)
expect_lt(test_harness(xval,function(x) { colSums(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { colMeans(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { vec(x) },function(x) { dim(x) <- c(length(x),1); x }),1e-6)
expect_lt(test_harness(xval,function(x) { vech(x) },
function(x) { x <- x[row(x) >= col(x)];
dim(x) <- c(length(x),1)
x }),1e-6)
expect_lt(test_harness(xval,function(x) { tril(x) },
function(x) { x[row(x) < col(x)] <- 0; x }),1e-6)
expect_lt(test_harness(xval,function(x) { triu(x) },
function(x) { x[row(x) > col(x)] <- 0; x }),1e-6)
expect_lt(test_harness(xval,function(x) { det(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { determinant(x,logarithm=TRUE)$modulus }),1e-6)
expect_lt(test_harness(xval,function(x) { determinant(x,logarithm=FALSE)$modulus }),1e-6)
expect_lt(test_harness(xval,function(x) { colSums(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { colMeans(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { rowSums(x) }),1e-6)
expect_lt(test_harness(xval,function(x) { rowMeans(x) }),1e-6)
set.seed(2015)
zval <- matrix(0.01 + runif(4*100,min=0,max=0.05),nrow=4)
xval <- tcrossprod(zval)
expect_lt(test_harness(xval,function(x) { norm(x,'O') }),1e-6)
expect_lt(test_harness(xval,function(x) { norm(x,'I') }),1e-6)
# Matrix::norm does not support type '2'
expect_lt(test_harness(xval,function(x) { norm(x,'2') },
function(x) { base::norm(x,'2') }),1e-6)
expect_lt(test_harness(xval,function(x) { norm(x,'M') }),1e-6)
expect_lt(test_harness(xval,function(x) { norm(x,'F') }),1e-6)
expect_lt(test_harness(xval,function(x) { sqrtm(x) }),1e-6)
```
## Symmetry
Some functions implicitly break symmetry, which could cause the differentiation
process to fail. For example, the 'chol' function is to be applied to a symmetric
matrix, but only looks at the upper triangular part, ignoring the lower triangular
part. This is demonstrated below. For the moment, the only 'solution' is to enforce
symmetry of the input. Eventually some native functionality around symmetry may
be implemented.
```{r symmetry,cache=TRUE,eval=TRUE,echo=TRUE}
set.seed(2015)
zval <- matrix(0.01 + runif(4*100,min=0,max=0.05),nrow=4)
xval <- tcrossprod(zval)
fsym <- function(x) { 0.5 * (x + t(x)) }
# this will fail:
#expect_lt(test_harness(xval,function(x) { chol(x) }),1e-6)
expect_gte(test_harness(xval,function(x) { chol(x) }),1.99)
# this will not:
expect_lt(test_harness(xval,function(x) { chol(fsym(x)) }),1e-6)
```
The functions `twomoments` and `theta` respect the symmetry of the quantities
being estimated.
# Warnings
This code is a proof of concept. The methods used to compute derivatives are not (yet)
space-efficient or necessarily numerically stable. User assumes all risk.
Derivatives are stored as a matrix in 'numerator layout'. That is the independent and
dependent variable are vectorized, then the derivative matrix has the same number
of columns as elements in the dependent variable. Thus the derivative of a
3 by 2 by 5 y with respect to a 1 by 2 by 2 x is a 30 by 4 matrix.