From 3ea8ba17d27e993ce2226f90a89c18bb3740de99 Mon Sep 17 00:00:00 2001
From: melff Laplace approximationSince this quadratic expansion—let us call it \(\ell^*_{\text{Lapl}}(\boldsymbol{y},\boldsymbol{b})\)—is
a (multivariate) quadratic function of \(\boldsymbol{b}\), the integral of its
exponential does have a closed-form solution (the relevant formula can
-be found in
harville:matrix.algebra
).
For purposes of estimation, the resulting approximate log-likelihood is more useful:
\[
@@ -246,8 +246,7 @@ Penalized quasi-likelihood (PQL)If one disregards the dependence of \(\tilde{\boldsymbol{H}}\) on \(\boldsymbol{\alpha}\) and \(\boldsymbol{b}\), then \(\tilde{\boldsymbol{b}}\) maximizes not only
\(\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\)
but also \(\ell^*_{\text{Lapl}}\). This
-motivates the following IWLS/Fisher scoring equations for \(\hat{\boldsymbol{\alpha}}\) and \(\tilde{\boldsymbol{b}}\) (see
-
breslow.clayton:approximate.inference.glmm
and this page):
\[
\begin{aligned}
\begin{bmatrix}
@@ -288,7 +287,7 @@ Penalized quasi-likelihood (PQL)
which can be solved to compute \(hat{\boldsymbol{\alpha}}\) and \(\tilde{\boldsymbol{b}}\) (for given \(\boldsymbol{\Sigma}\))
+which can be solved to compute \(\hat{\boldsymbol{\alpha}}\) and \(\tilde{\boldsymbol{b}}\) (for given \(\boldsymbol{\Sigma}\))
Here
\[
\boldsymbol{V} =
@@ -299,8 +298,8 @@ Penalized quasi-likelihood (PQL)
Following breslow.clayton:approximate.inference.glmm
the
-variance parameters in \(\boldsymbol{Sigma}\) are estimated by
+
Following Breslow and Clayton (1993) +the variance parameters in \(\boldsymbol{\Sigma}\) are estimated by minimizing
\[
q_1 =
@@ -313,7 +312,7 @@ Penalized quasi-likelihood (PQL)
This motivates the following algorithm, which is strongly inspired by
the glmmPQL()
function in Brian Ripley’s R package
-MASS:
lme()
function from package nlme for this,
-because the weighting matrix \(\boldsymbol{W}\) is non-diagonal. Instead,
-\(q_1\) or \(q_2\) are minimized using the function
-nlminb
from the standard R package “stats”.
+lme()
function from package nlme (Pinheiro and Bates 2000) for this, because the
+weighting matrix \(\boldsymbol{W}\) is
+non-diagonal. Instead, \(q_1\) or \(q_2\) are minimized using the function
+nlminb
from the standard R package “stats” or some
+other optimizer chosen by the user.
The (first-order) Solomon approximation is based on the quadratic +
The (first-order) Solomon approximation (Solomon and Cox 1992) is based on the quadratic expansion the integrand
\[
\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\approx
@@ -392,13 +392,12 @@ The resulting estimation technique is very similar to PQL (again, see
- The resulting estimation technique is very similar to PQL (again, see Breslow and Clayton 1993 for a
+discussion). The only difference is the construction of the
+“working dependent” variable \(\boldsymbol{y}^*\). With PQL it is
constructed as \[\boldsymbol{y}^* =
\boldsymbol{X}\boldsymbol{\alpha} + \boldsymbol{Z}\boldsymbol{b} +
-\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{pi})\] while the
+\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{\pi})\]The Solomon-Cox approximation
while the
MQL working dependent variable is justMarginal quasi-likelhood (MQL)
-breslow.clayton:approximate.inference.glmm
for a
-discussion). The only difference is the construction of the “working
-dependent” variable \(\boldsymbol{y}^*\). With PQL it is
+
\[
\boldsymbol{y}^* = \boldsymbol{X}\boldsymbol{\alpha} +
@@ -424,6 +423,33 @@ Marginal quasi-likelhood (MQL)\(\hat{\boldsymbol{\alpha}}\)
.
agresti:categorical.data.analysis.2002
. Estimating
-these models is also supported by the function multinom()
-in the R package "nnet" MASS
. In the package "mclogit", the function to
-estimate these models is called mblogit()
(see the relevant
-manual page), which uses the
-infrastructure for estimating conditional logit models, exploiting the
-fact that baseline-category logit models can be re-expressed as
-condigional logit models.
+These models are described in Agresti
+(2002). Estimating these models is also supported by the function
+multinom()
in the R package “nnet” (Venables and Ripley 2002). In the package
+“mclogit”, the function to estimate these models is called
+mblogit()
, which uses the infrastructure for estimating
+conditional logit models, exploiting the fact that baseline-category
+logit models can be re-expressed as condigional logit models.
Baseline-category logit models are constructed as follows. Suppose a categorical dependent variable or response with categories \(j=1,\ldots,q\) is observed for individuals \(i=1,\ldots,n\). Let \(\pi_{ij}\) denote the probability that the @@ -171,11 +170,27 @@
Conditional logit models are motivated by a variety of
considerations, notably as a way to model binary panel data or responses
in case-control-studies. The variant supported by the package “mclogit”
-is motivated by the analysis of discrete choices and goes back to mcfadden:conditional.logit
.
-Here, a series of individuals \(i=1,ldots,n\) is observed to have made a
-choice (represented by a number \(j\))
-from a choice set \(\mathcal{S}_i\),
-the set of alternatives at the individual’s disposal. Each alternatives
-\(j\) in the choice set can be
-described by the values \(x_{1ij},\ldots,x_{1ij}\) of \(r\) attribute variables (where the
+is motivated by the analysis of discrete choices and goes back to McFadden (1974). Here, a series of individuals
+\(i=1,\ldots,n\) is observed to have
+made a choice (represented by a number \(j\)) from a choice set \(\mathcal{S}_i\), the set of alternatives at
+the individual’s disposal. Each alternatives \(j\) in the choice set can be described by
+the values \(x_{1ij},\ldots,x_{1ij}\)
+of \(r\) attribute variables (where the
variables are enumerated as \(i=1,\ldots,r\)). (Note that in contrast to
the baseline-category logit model, these values vary between choice
alternatives.) Conditional logit models then posit that individual \(i\) chooses alternative \(j\) from his or her choice set \(\mathcal{S}_i\) with probability
Conditional logit models appear more parsimonious than
baseline-category logit models in so far as they have only one
coefficient for each independent variables.[^1] In the “mclogi" package,
-these models can be estimated using the function mclogit()
-(see the relevant manual page).
mclogit()
.
My interest in conditional logit models derives from my research into
the influence of parties' political positions on the patterns of voting.
Here, the political positions are the attributes of the alternatives and
the choice sets are the sets of parties that run candidates in a
countries at various points in time. For the application of the
-conditional logit models, see my doctoral thesis elff:politische.ideologien
.
glm.fit()
function from the “stats” package of
-R.
+glm.fit()
function from the “stats” package of R
+(Nelder and Wedderburn 1972; McCullagh and Nelder
+1989; R Core Team 2023).
If \(\pi_{ij}\) is the probability that individual \(i\) chooses alternative \(j\) from his/her choice @@ -202,7 +203,7 @@
Here \(y_{ij}=n_{ij}/n_{i+}\), while -\(boldsymbol{N}\) is a diagonal matrix +\(\boldsymbol{N}\) is a diagonal matrix with diagonal elements \(n_{i+}\).
Newton-Raphson iterations then take the form
\[
@@ -298,11 +299,34 @@ The IWLS algorithm used to fit conditional logit
\]
elff:divisions.positions.voting
.
+Elff (2009).
In its earliest incarnation, the package supported only a very simple random-intercept extension of conditional logit models (or “mixed conditional logit models”, hence the name of the package). These models @@ -183,12 +183,30 @@
breslow.clayton:approximate.inference.glmm
.
+the PQL and MQL techniques are described e.g. in Breslow and Clayton (1993).
+