diff --git a/404.html b/404.html index b4fd8e3..dfb3d9d 100644 --- a/404.html +++ b/404.html @@ -5,17 +5,33 @@ -
Site built with pkgdown 2.0.7.
+Using preferably template.
../vignettes/approximations.Rmd
+ approximations.Rmd
A crucial problem for inference about non-linear models with random +effects is that the likelihood function for such models involves +integrals for which no analytical solution exists.
+For given values \(\boldsymbol{b}\) +of the random effects the likelihood function of a conditional logit +model (and therefore also of a baseline-logit model) can be written in +the form
+\[ +\mathcal{L}_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b}) += +\exp\left(\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\right) +=\exp +\left( +\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha}) +-\frac12\ln\det(\boldsymbol{\Sigma}) +-\frac12\boldsymbol{b}'\boldsymbol{\Sigma}^{-1}\boldsymbol{b} +\right) +\]
+However, this “complete data” likelihood function cannot be used for +inference, because it depends on the unobserved random effects. To +arrive at a likelihood function that depends only on observed data, one +needs to used the following integrated likelihood function:
+\[ +\mathcal{L}_{\text{obs}}(\boldsymbol{y}) += +\int +\exp\left(\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\right) +\partial \boldsymbol{b} += +\int +\exp +\left( +\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha}) +-\frac12\ln\det(\boldsymbol{\Sigma}) +-\frac12\boldsymbol{b}'\boldsymbol{\Sigma}^{-1}\boldsymbol{b} +\right) +\partial \boldsymbol{b} +\]
+In general, this integral cannot be “solved”, i.e. eliminated from +the formula by analytic means (it is “analytically untractable”). +Instead, one will compute it either using numeric techniques (e.g. using +numerical quadrature) or approximate it using analytical techniques. +Unless there is only a single level of random effects numerical +quadrature can become computationally be demanding, that is, the +computation of the (log-)likelihood function and its derivatives can +take a lot of time even on modern, state-of-the-art computer hardware. +Yet approximations based on analytical techniques hand may lead to +biased estimates in particular in samples where the number of +observations relative to the number of random offects is small, but at +least they are much easier to compute and sometimes making inference +possible after all.
+The package “mclogit” supports to kinds of analytical approximations, +the Laplace approximation and what one may call the Solomon-Cox +appoximation. Both approximations are based on a quadratic expansion of +the integrand so that the thus modified integral does have a closed-form +solution, i.e. is analytically tractable.
+The (first-order) Laplace approximation is based on the quadratic +expansion the logarithm of the integrand, the complete-data +log-likelihood
+\[ +\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\approx +\ell(\boldsymbol{y}|\tilde{\boldsymbol{b}};\boldsymbol{\alpha}) +- +\frac12 +(\boldsymbol{b}-\tilde{\boldsymbol{b}})' +\tilde{\boldsymbol{H}} +(\boldsymbol{b}-\tilde{\boldsymbol{b}}) +-\frac12\ln\det(\boldsymbol{\Sigma}) +-\frac12(\boldsymbol{b}-\tilde{\boldsymbol{b}})'\boldsymbol{\Sigma}^{-1}(\boldsymbol{b}-\tilde{\boldsymbol{b}}) +\]
+where \(\tilde{\boldsymbol{b}}\) is +the solution to
+\[ +\frac{\partial\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})}{\partial\boldsymbol{b}} += 0 +\]
+and \(\tilde{\boldsymbol{H}}=\boldsymbol{H}(\tilde{\boldsymbol{b}})\) +is the value of the negative Hessian with respect to \(\boldsymbol{b}\)
+\[ +\boldsymbol{H}(\boldsymbol{b})=-\frac{\partial^2\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha})}{\partial\boldsymbol{b}\partial\boldsymbol{b}'} +\]
+for \(\boldsymbol{b}=\tilde{\boldsymbol{b}}\).
+Since this quadratic expansion—let us call it \(\ell^*_{\text{Lapl}}(\boldsymbol{y},\boldsymbol{b})\)—is
+a (multivariate) quadratic function of \(\boldsymbol{b}\), the integral of its
+exponential does have a closed-form solution (the relevant formula can
+be found in harville:matrix.algebra
).
For purposes of estimation, the resulting approximate log-likelihood +is more useful:
+\[ +\ell^*_{\text{Lapl}} += +\ln\int \exp(\ell_{\text{Lapl}}(\boldsymbol{y},\boldsymbol{b})) +\partial\boldsymbol{b} += +\ell(\boldsymbol{y}|\tilde{\boldsymbol{b}};\boldsymbol{\alpha}) +- +\frac12\tilde{\boldsymbol{b}}'\boldsymbol{\Sigma}^{-1}\tilde{\boldsymbol{b}} +- +\frac12\ln\det(\boldsymbol{\Sigma}) +- +\frac12\ln\det\left(\tilde{\boldsymbol{H}}+\boldsymbol{\Sigma}^{-1}\right). +\]
+If one disregards the dependence of \(\tilde{\boldsymbol{H}}\) on \(\boldsymbol{\alpha}\) and \(\boldsymbol{b}\), then \(\tilde{\boldsymbol{b}}\) maximizes not only
+\(\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\)
+but also \(\ell^*_{\text{Lapl}}\). This
+motivates the following IWLS/Fisher scoring equations for \(\hat{\boldsymbol{\alpha}}\) and \(\tilde{\boldsymbol{b}}\) (see
+breslow.clayton:approximate.inference.glmm
and this page):
\[ +\begin{aligned} +\begin{bmatrix} +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} & +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{Z} \\ +\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{X} & +\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{Z} + +\boldsymbol{\Sigma}^{-1}\\ +\end{bmatrix} +\begin{bmatrix} +\hat{\boldsymbol{\alpha}}\\ +\tilde{\boldsymbol{b}}\\ +\end{bmatrix} += +\begin{bmatrix} +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*\\ +\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{y}^* +\end{bmatrix} +\end{aligned} +\]
+where
+\[ +\boldsymbol{y}^* = \boldsymbol{X}\boldsymbol{\alpha} + +\boldsymbol{Z}\boldsymbol{b} + +\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{\pi}) +\]
+is the IWLS “working dependend variable” with \(\boldsymbol{\alpha}\), \(\boldsymbol{b}\), \(\boldsymbol{W}\), and \(\boldsymbol{\pi}\) computed in an earlier +iteration.
+Substitutions lead to the equations:
+\[ +(\boldsymbol{X}\boldsymbol{V}^-\boldsymbol{X})\hat{\boldsymbol{\alpha}} += +\boldsymbol{X}\boldsymbol{V}^-\boldsymbol{y}^* +\]
+and
+\[ +(\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{Z} + +\boldsymbol{\Sigma}^{-1})\boldsymbol{b} = +\boldsymbol{Z}'\boldsymbol{W}(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha}) +\]
+which can be solved to compute \(hat{\boldsymbol{\alpha}}\) and \(\tilde{\boldsymbol{b}}\) (for given \(\boldsymbol{\Sigma}\))
+Here
+\[ +\boldsymbol{V} = +\boldsymbol{W}^-+\boldsymbol{Z}\boldsymbol{\Sigma}\boldsymbol{Z}' +\]
+and
+\[ +\boldsymbol{V}^- = \boldsymbol{W}- +\boldsymbol{W}\boldsymbol{Z}'\left(\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{Z}+\boldsymbol{\Sigma}^{-1}\right)^{-1}\boldsymbol{Z}\boldsymbol{W} +\]
+Following breslow.clayton:approximate.inference.glmm
the
+variance parameters in \(\boldsymbol{Sigma}\) are estimated by
+minimizing
\[ +q_1 = +\det(\boldsymbol{V})+(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})\boldsymbol{V}^-(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha}) +\]
+or the “REML” variant:
+\[ +q_2 = +\det(\boldsymbol{V})+(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})\boldsymbol{V}^-(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})+\det(\boldsymbol{X}'\boldsymbol{V}^{-}\boldsymbol{X}) +\]
+This motivates the following algorithm, which is strongly inspired by
+the glmmPQL()
function in Brian Ripley’s R package
+MASS:
This algorithm is a modification of the IWLS algorithm used to fit conditional
+logit models without random effects. Instead of just solving a linear
+requatoin in step 3, it estimates a weighted linear mixed-effects model.
+In contrast to glmmPQL()
it does not use the
+lme()
function from package nlme for this,
+because the weighting matrix \(\boldsymbol{W}\) is non-diagonal. Instead,
+\(q_1\) or \(q_2\) are minimized using the function
+nlminb
from the standard R package “stats”.
The (first-order) Solomon approximation is based on the quadratic +expansion the integrand
+\[ +\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\approx +\ell(\boldsymbol{y}|\boldsymbol{0};\boldsymbol{\alpha}) ++ +\boldsymbol{g}_0' +\boldsymbol{b} +- +\frac12 +\boldsymbol{b}' +\boldsymbol{H}_0 +\boldsymbol{b} +-\frac12\ln\det(\boldsymbol{\Sigma}) +-\frac12\boldsymbol{b}'\boldsymbol{\Sigma}^{-1}\boldsymbol{b} +\]
+where \(\boldsymbol{g}\_0=\boldsymbol{g}(\boldsymbol{0})\) +is the gradient of \(\ell(\boldsymbol{y}\|\boldsymbol{b};\boldsymbol{\alpha})\)
+\[ +\boldsymbol{g}(\boldsymbol{b})=-\frac{\partial\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha})}{\partial\boldsymbol{b}} +\]
+at \(\boldsymbol{b}=\boldsymbol{0}\), while +\(\boldsymbol{H}\_0=\boldsymbol{H}(\boldsymbol{0})\) +is the negative Hessian at \(\boldsymbol{b}=\boldsymbol{0}\).
+Like before, the integral of the exponential this quadratic expansion +(which we refer to as \(\ell_{\text{SC}}(\boldsymbol{y},\boldsymbol{b})\)) +has a closed-form solution, as does its logarithm, which is:
+\[ +\ln\int \exp(\ell_{\text{SC}}(\boldsymbol{y},\boldsymbol{b})) +\partial\boldsymbol{b} += +\ell(\boldsymbol{y}|\boldsymbol{0};\boldsymbol{\alpha}) +- +\frac12\boldsymbol{g}_0'\left(\boldsymbol{H}_0+\boldsymbol{\Sigma}^{-1}\right)^{-1}\boldsymbol{g}_0 +- +\frac12\ln\det(\boldsymbol{\Sigma}) +- +\frac12\ln\det\left(\boldsymbol{H}_0+\boldsymbol{\Sigma}^{-1}\right). +\]
+The resulting estimation technique is very similar to PQL (again, see
+breslow.clayton:approximate.inference.glmm
for a
+discussion). The only difference is the construction of the “working
+dependent” variable \(\boldsymbol{y}^*\). With PQL it is
+constructed as \[\boldsymbol{y}^* =
+\boldsymbol{X}\boldsymbol{\alpha} + \boldsymbol{Z}\boldsymbol{b} +
+\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{pi})\] while the
+MQL working dependent variable is just
\[ +\boldsymbol{y}^* = \boldsymbol{X}\boldsymbol{\alpha} + +\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{\pi}) +\]
+so that the algorithm has the following steps:
+../vignettes/baseline-and-conditional-logit.Rmd
+ baseline-and-conditional-logit.Rmd
Baseline-category logit models can be expressed as particular form of +conditional logit models. In a conditional logit model (without random +effects) the probability that individual \(i\) chooses alternative \(j\) from choice set \(\mathcal{S}_i\) is
+\[ +\pi_{ij} = +\frac{\exp(\eta_{ij})}{\sum_{k\in\mathcal{S}_i}\exp(\eta_{ik})} +\]
+where
+\[ +\eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_qx_{qij} +\]
+In a baseline-category logit model, the set of alternatives is the +same for all individuals \(i\) that is +\(\mathcal{S}_i = {1,\ldots,q}\) and +the linear part of the model can be written like:
+\[ +\eta_{ij} = \beta_{j0}+\beta_{j1}x_{i1}+\cdots+\beta_{jr}x_{ri} +\]
+where the coefficients in the equation for baseline category \(j\) are all zero, i.e.
+\[ +\beta_{10} = \cdots = \beta_{1r} = 0 +\]
+After setting
+\[ +\begin{aligned} +x_{(g\times(j-1))ij} = d_{gj}, \quad +x_{(g\times(j-1)+h)ij} = d_{gj}x_{hi}, \qquad +\text{with }d_{gj}= +\begin{cases} +0&\text{for } j\neq g\text{ or } j=g\text{ and } j=0\\ +1&\text{for } j=g \text{ and } j\neq0\\ +\end{cases} +\end{aligned} +\]
+we have for the log-odds:
+\[ +\begin{aligned} +\begin{aligned} +\ln\frac{\pi_{ij}}{\pi_{i1}} +&=\beta_{j0}+\beta_{ji}x_{1i}+\cdots+\beta_{jr}x_{ri} +\\ +&=\sum_{h}\beta_{jh}x_{hi}=\sum_{g,h}\beta_{jh}d_{gj}x_{hi} +=\sum_{g,h}\alpha_{g\times(j-1)+h}(d_{gj}x_{hi}-d_{g1}x_{hi}) +=\sum_{g,h}\alpha_{g\times(j-1)+h}(x_{(g\times(j-1)+h)ij}-x_{(g\times(j-1)+h)i1})\\ +&=\alpha_{1}(x_{1ij}-x_{1i1})+\cdots+\alpha_{s}(x_{sij}-x_{si1}) +\end{aligned} +\end{aligned} +\]
+where \(\alpha_1=\beta_{21}\), \(\alpha_2=\beta_{22}\), etc.
+That is, the baseline-category logit model is translated into a +conditional logit model where the alternative-specific values of the +attribute variables are interaction terms composed of +alternativ-specific dummes and individual-specific values of +characteristics variables.
+Analogously, the random-effects extension of the baseline-logit model +can be translated into a random-effects conditional logit model where +the random intercepts in the logit equations of the baseline-logit model +are translated into random slopes of category-specific dummy +variables.
+../vignettes/baseline-logit.Rmd
+ baseline-logit.Rmd
Multinomial baseline-category logit models are a generalisation of
+logistic regression, that allow to model not only binary or dichotomous
+responses, but also polychotomous responses. In addition, they allow to
+model responses in the form of counts that have a pre-determined sum.
+These models are described in agresti:categorical.data.analysis.2002
. Estimating
+these models is also supported by the function multinom()
+in the R package "nnet" MASS
. In the package "mclogit", the function to
+estimate these models is called mblogit()
(see the relevant
+manual page), which uses the
+infrastructure for estimating conditional logit models, exploiting the
+fact that baseline-category logit models can be re-expressed as
+condigional logit models.
Baseline-category logit models are constructed as follows. Suppose a +categorical dependent variable or response with categories \(j=1,\ldots,q\) is observed for individuals +\(i=1,\ldots,n\). Let \(\pi_{ij}\) denote the probability that the +value of the dependent variable for individual \(i\) is equal to \(j\), then the baseline-category logit model +takes the form:
+\[ +\begin{aligned} +\pi_{ij} = +\begin{cases} +\dfrac{\exp(\alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri})} +{1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})} +& \text{for } j>1\\[20pt] +\dfrac{1} +{1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})} +& \text{for } j=1 +\end{cases} +\end{aligned} +\]
+where the first category (\(j=1\)) +is the baseline category.
+Equivalently, the model can be expressed in terms of log-odds, +relative to the baseline-category:
+\[ +\ln\frac{\pi_{ij}}{\pi_{i1}} += +\alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri}. +\]
+Here the relevant parameters of the model are the coefficients \(\alpha_{jk}\) which describe how the values +of independent variables (numbered \(k=1,\ldots,r\)) affect the relative chances +of the response taking a value \(j\) +versus taking the value \(1\). Note +that there is one coefficient for each independent variable and each +response other than the baseline category.
+../vignettes/conditional-logit.Rmd
+ conditional-logit.Rmd
Conditional logit models are motivated by a variety of
+considerations, notably as a way to model binary panel data or responses
+in case-control-studies. The variant supported by the package “mclogit”
+is motivated by the analysis of discrete choices and goes back to mcfadden:conditional.logit
.
+Here, a series of individuals \(i=1,ldots,n\) is observed to have made a
+choice (represented by a number \(j\))
+from a choice set \(\mathcal{S}_i\),
+the set of alternatives at the individual’s disposal. Each alternatives
+\(j\) in the choice set can be
+described by the values \(x_{1ij},\ldots,x_{1ij}\) of \(r\) attribute variables (where the
+variables are enumerated as \(i=1,\ldots,r\)). (Note that in contrast to
+the baseline-category logit model, these values vary between choice
+alternatives.) Conditional logit models then posit that individual \(i\) chooses alternative \(j\) from his or her choice set \(\mathcal{S}_i\) with probability
\[ +\pi_{ij} = \frac{\exp(\alpha_1x_{1ij}+\cdots+\alpha_rx_{rij})} + {\sum_{k\in\mathcal{S}_i}\exp(\alpha_1x_{1ik}+\cdots+\alpha_rx_{rik})}. +\]
+It is worth noting that the conditional logit model does not require +that all individuals face the same choice sets. Only that the +alternatives in the choice sets can be distinguished from one another by +the attribute variables.
+The similarities and differences of these models to baseline-category +logit model becomes obvious if one looks at the log-odds relative to the +first alternative in the choice set:
+\[ +\ln\frac{\pi_{ij}}{\pi_{i1}} += +\alpha_{1}(x_{1ij}-x_{1i1})+\cdots+\alpha_{r}(x_{rij}-x_{ri1}). +\]
+Conditional logit models appear more parsimonious than
+baseline-category logit models in so far as they have only one
+coefficient for each independent variables.[^1] In the “mclogi" package,
+these models can be estimated using the function mclogit()
+(see the relevant manual page).
My interest in conditional logit models derives from my research into
+the influence of parties' political positions on the patterns of voting.
+Here, the political positions are the attributes of the alternatives and
+the choice sets are the sets of parties that run candidates in a
+countries at various points in time. For the application of the
+conditional logit models, see my doctoral thesis elff:politische.ideologien
.
../vignettes/fitting-mclogit.Rmd
+ fitting-mclogit.Rmd
The package “mclogit” fits conditional logit models using a maximum
+likelihood estimator. It does this by maximizing the log-likelihood
+function using an iterative weighted least-squares (IWLS)
+algorithm, which follows the algorithm used by the
+glm.fit()
function from the “stats” package of
+R.
If \(\pi_{ij}\) is the probability +that individual \(i\) chooses +alternative \(j\) from his/her choice +set \(\mathcal{S}_i\), where
+\[ +\pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})} +\]
+and if \(y_{ij}\) is the dummy +variable with equals 1 if individual \(i\) chooses alternative \(j\) and equals 0 otherwise, the +log-likelihood function (given that the choices are identically +independent distributed given \(\pi_{ij}\)) can be written as
+\[ +\ell=\sum_{i,j}y_{ij}\ln\pi_{ij} + =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right) +\]
+If the data are aggregated in the terms of counts such that \(n_{ij}\) is the number of individuals with +the same choice set and the same choice probabilities \(\pi_{ij}\) that have chosen alternative +\(j\), the log-likelihood is (given +that the choices are identically independent distributed given \(\pi_{ij}\))
+\[ +\ell=\sum_{i,j}n_{ij}\ln\pi_{ij} + =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right) +\]
+where \(n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}\).
+If
+\[ +\eta_{ij} = +\alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\boldsymbol{x}_{ij}'\boldsymbol{\alpha} +\]
+then the gradient of the log-likelihood with respect to the +coefficient vector \(\boldsymbol{\alpha}\) is
+\[ +\frac{\partial\ell}{\partial\boldsymbol{\alpha}} += +\sum_{i,j} +\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} +\frac{\partial\ell}{\partial\eta_{ij}} += +\sum_{i,j} +\boldsymbol{x}_{ij} +(n_{ij}-n_{i+}\pi_{ij}) += +\sum_{i,j} +\boldsymbol{x}_{ij} +n_{i+} +(y_{ij}-\pi_{ij}) += +\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) +\]
+and the Hessian is
+\[ +\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} += +\sum_{i,j} +\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}} +\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'} +\frac{\partial\ell^2}{\partial\eta_{ij}^2} += +- +\sum_{i,j,k} +\boldsymbol{x}_{ij} +n_{i+} +(\delta_{jk}-\pi_{ij}\pi_{ik}) +\boldsymbol{x}_{ij}' += +- +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} +\]
+Here \(y_{ij}=n_{ij}/n_{i+}\), while +\(boldsymbol{N}\) is a diagonal matrix +with diagonal elements \(n_{i+}\).
+Newton-Raphson iterations then take the form
+\[ +\boldsymbol{\alpha}^{(s+1)} += +\boldsymbol{\alpha}^{(s)} +- +\left( +\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'} +\right)^{-1} +\frac{\partial\ell}{\partial\boldsymbol{\alpha}} += +\boldsymbol{\alpha}^{(s)} ++ +\left( +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} +\right)^{-1} +\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) +\]
+where \(\boldsymbol{\pi}\) and \(\boldsymbol{W}\) are evaluated at \(\boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}\).
+Multiplying by \(\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}\) +gives
+\[ +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} +\boldsymbol{\alpha}^{(s+1)} += +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} +\boldsymbol{\alpha}^{(s)} ++ +\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi}) += +\boldsymbol{X}'\boldsymbol{W} +\left(\boldsymbol{X}\boldsymbol{\alpha}^{(s)}+\boldsymbol{W}^-\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})\right) += +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^* +\]
+where \(\boldsymbol{W}^-\) is a +generalized inverse of \(\boldsymbol{W}\) and \(\boldsymbol{y}^*\) is a “working response +vector” with elements
+\[ +y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}} +\]
+The IWLS algorithm thus involves the following steps:
+Create some suitable starting values for \(\boldsymbol{\pi}\), \(\boldsymbol{W}\), and \(\boldsymbol{y}^*\)
Construct the “working dependent variable” \(\boldsymbol{y}^*\)
Solve the equation
+\[ +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} +\boldsymbol{\alpha} += +\boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^* +\]
+for \(\boldsymbol{\alpha}\).
+Compute updated \(\boldsymbol{\eta}\), \(\boldsymbol{\pi}\), \(\boldsymbol{W}\), and \(\boldsymbol{y}^*\).
Compute the updated value for the log-likelihood or the +deviance
+\[ +d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}} +\]
+If the decrease of the deviance (or the increase of the +log-likelihood) is smaller than a given tolerance criterian (typically +\(\Delta d \leq 10^{-7}\)) stop the +algorighm and declare it as converged. Otherwise go back to step 2 with +the updated value of \(\boldsymbol{\alpha}\).
The starting values for the algorithm used by the mclogit +package are constructe as follows:
+Set
+\[ +\eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12) + - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln +(n_{ij}+\tfrac12) +\]
+(where \(q_i\) is the size of the +choice set \(\mathcal{S}_i\))
+Compute the starting values of the choice probabilities \(\pi_{ij}^{(0)}\) according to the equation +at the beginning of the page
Compute intial values of the working dependent variable according +to
+\[ +y_{ij}^{*(0)} += +\eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}} +\]
+../vignettes/random-effects.Rmd
+ random-effects.Rmd
The “mclogit” package allows for the presence of random effects in
+baseline-category logit and conditional logit models. In
+baseline-category logit models, the random effects may represent
+(unobserved) characteristics that are common the individuals in
+clusters, such as regional units or electoral districts or the like. In
+conditional logit models, random effects may represent attributes that
+share across several choice occasions within the same context of choice.
+That is, if one analyses voting behaviour across countries then an
+random effect specific to the Labour party may represent unobserved
+attributes of this party in terms of which it differs from (or is more
+like) the Social Democratic Party of Germany (SPD). My original
+motivation for working on conditional logit models with random effects
+was to make it possible to assess the impact of parties’ political
+positions on the patterns of voting behaviour in various European
+countries. The results of this research are published in an article in
+Electoral Studies elff:divisions.positions.voting
.
In its earliest incarnation, the package supported only a very simple +random-intercept extension of conditional logit models (or “mixed +conditional logit models”, hence the name of the package). These models +can be written as
+\[ +\pi_{ij} = +\frac{\exp(\eta_{ij})}{\sum_{k\in\mathcal{S}_i}\exp(\eta_{ik})} +\]
+with
+\[ +\eta_{ij}=\sum_h\alpha_hx_{hij}+\sum_kz_{ik}b_{jk} +\]
+where \(x_{hij}\) represents values +of independent variables, \(\alpha_h\) +are coefficients, \(z_{ik}\) are dummy +ariables (that are equal to \(1\) if +\(i\) is in cluster \(k\) and equal to \(0\) otherwise), \(b_{jk}\) are random effects with a normal +distribution with expectation \(0\) and +variance parameter \(\sigma^2\).
+Later releases also added support for baseline-category logit models +(initially only without random effects). In order to support random +effects in baseline-category logit models, the package had to be further +modified to allow for conditional logit models with random slopes (this +is so because baseline-categoy logit models can be expressed as a +particular type of conditional logit models).
+It should be noted that estimating the parameters of random effects
+multinomial logit models (whether of baseline-category logit variety or
+the conditional logit variety) involves the considerable challenges
+already known from the “generalized linear mixed models” literature. The
+main challenge is that the likelihood function involves analytically
+intractable integrals (i.e. there is know way to “solve” or eliminate
+the intergrals from the formula of the likelihood function). This means
+that either computationally intensive methods for the computation of
+such integrals have to be used or certain approximations (most notably
+the Laplace approximation technique and its variants), which may lead to
+biases in certain situations. The “mclogit” package only supports
+approximate likelihood-based inference. Most of the time the
+PQL-technique based on a (first-order) Laplace approximation was
+supported, release 0.8, “mclogit” also supports the MQL technique, which
+is based on a (first-order) Solomon-Cox approximation. The ideas behind
+the PQL and MQL techniques are described e.g. in breslow.clayton:approximate.inference.glmm
.
Elff M (2023). mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion. -R package version 0.9.6, https://github.com/melff/mclogit/, http://mclogit.elff.eu. +R package version 0.9.8, https://github.com/melff/mclogit/, http://melff.github.io/mclogit/.
@Manual{, title = {mclogit: Multinomial Logit Models, with or without Random Effects or Overdispersion}, author = {Martin Elff}, year = {2023}, - note = {R package version 0.9.6, https://github.com/melff/mclogit/}, - url = {http://mclogit.elff.eu}, + note = {R package version 0.9.8, https://github.com/melff/mclogit/}, + url = {http://melff.github.io/mclogit/}, }@@ -82,6 +127,7 @@
Site built with pkgdown 2.0.7.
+Using preferably template.
This packages provides estimators for multinomial logit models in their conditional logit and baseline logit variants, with or without random effects, with or without overdispersion. Random effects models are estimated using the PQL technique (based on a Laplace approximation) or the MQL technique (based on a Solomon-Cox approximation). Estimates should be treated with caution if the group sizes are small.
-Releases will be published on CRAN. Pre-releases will be available here in source and binary form. To install from the sources on GitHub you can use devtools::install_github("melff/mclogit",subdir="pkg")
.
More information about the package can be found at my site
@@ -124,6 +175,7 @@Site built with pkgdown 2.0.7.
+Using preferably template.
Site built with pkgdown 2.0.7.
+Using preferably template.
dispersion(object,method, ...)
# S3 method for mclogit
-dispersion(object,method=NULL, ...)
NULL
, in which case the default estimator, "Afroz"
is used. The estimators are discussed in Afroz et al. (2019).
+ an optional formula that specifies groups of observations + relevant for the estimation of overdispersion. Prediced probabilities should be + constant within groups, otherwise a warning is generated + since the overdispersion estimate may + be imprecise.
other arguments, ignored or passed to other methods.
Site built with pkgdown 2.0.7.
+Using preferably template.
Site built with pkgdown 2.0.7.
+Using preferably template.
Site built with pkgdown 2.0.7.
+Using preferably template.
Site built with pkgdown 2.0.7.
+Using preferably template.
diff --git a/reference/mblogit.html b/reference/mblogit.html index eb0874b..4fddea7 100644 --- a/reference/mblogit.html +++ b/reference/mblogit.html @@ -1,9 +1,21 @@ -Agresti, Alan. 2002. Categorical Data Analysis. 2nd ed, Hoboken, NJ: Wiley. - https://doi.org/10.1002/0471249688
+ doi:10.1002/0471249688Breslow, N.E. and D.G. Clayton. 1993. "Approximate Inference in Generalized Linear Mixed Models". Journal of the American Statistical Association 88 (421): 9-25. - https://doi.org/10.1080/01621459.1993.10594284
+ doi:10.1080/01621459.1993.10594284Site built with pkgdown 2.0.7.
+Using preferably template.
Site built with pkgdown 2.0.7.
+Using preferably template.
"Afroz"
) is used, if FALSE
, the dispersion parameter
is set to 1, that is, no dispersion. For details see dispersion
.
+
+ an optional formula that specifies groups of observations + relevant for the estimation of overdispersion. Covariates should be + constant within groups, otherwise a warning is generated + since the overdispersion estimate may + be imprecise.
a list of parameters for the fitting process. @@ -230,6 +284,12 @@
mclogit
will complain about such model a misspecification
explicitely.
+From version 0.9.7 it is possible to choose the optimization
+ technique used for the inner iterations of the PQL/MQL: either
+ nlminb
(the default), nlm
,
+ or any of the algorithms (other than "Brent" supported by
+ optim
). To choose the optimizer, use the
+ appropriate argument for mmclogit.control
.
Site built with pkgdown 2.0.7.
+Using preferably template.
logical; should a negative deviance stop the algorithm?
a character string, one of
+ "nlminb", "nlm", "Nelder-Mead", "BFGS", "CG", "L-BFGS-B", "SANN".
+ See nlminb
, nlm
, optim
.
integer; the maximum number of inner iterations
integer; the type
argument passed to
+ optim
if
+ "CG" is selected as inner optimizer.
integer; the alpha
argument passed to
+ optim
if
+ "Nelder-Mead" is selected as inner optimizer.
integer; the beta
argument passed to
+ optim
if
+ "Nelder-Mead" is selected as inner optimizer.
integer; the gamma
argument passed to
+ optim
if
+ "Nelder-Mead" is selected as inner optimizer.
integer; the temp
argument passed to
+ optim
if
+ "SANN" is selected as inner optimizer.
integer; the tmax
argument passed to
+ optim
if
+ "SANN" is selected as inner optimizer.
Site built with pkgdown 2.0.7.
+Using preferably template.
Site built with pkgdown 2.0.7.
+Using preferably template.