diff --git a/.github/workflows/pkgdown.yml b/.github/workflows/pkgdown.yml
index e16a27c..c08c6f5 100644
--- a/.github/workflows/pkgdown.yml
+++ b/.github/workflows/pkgdown.yml
@@ -38,7 +38,7 @@ jobs:
         run: cp -a README.md pkg
 
       - name: Build site
-        run: pkgdown::build_site_github_pages(pkg = "pkg", dest_dir = "../docs", new_process = FALSE, install = FALSE)
+        run: install.packages("preferable");pkgdown::build_site_github_pages(pkg = "pkg", dest_dir = "../docs", new_process = FALSE, install = FALSE)
         shell: Rscript {0}
 
       - name: Deploy to GitHub pages 🚀
diff --git a/README.md b/README.md
index 33f7307..5ba4c3e 100644
--- a/README.md
+++ b/README.md
@@ -8,8 +8,11 @@
 
 <!-- [![Build status](https://ci.appveyor.com/api/projects/status/289k656f3jsbotd2?svg=true)](https://ci.appveyor.com/project/melff/mclogit) one CI service is enough -->
 
-This packages provides estimators for multinomial logit models in their conditional logit and baseline logit variants, with or without random effects, with or without overdispersion. Random effects models are estimated using the PQL technique (based on a Laplace approximation) or the MQL technique (based on a Solomon-Cox approximation). Estimates should be treated with caution if the group sizes are small.
+This packages provides estimators for multinomial logit models in their
+conditional logit and baseline logit variants, with or without random effects,
+with or without overdispersion. Random effects models are estimated using the
+PQL technique (based on a Laplace approximation) or the MQL technique (based on
+a Solomon-Cox approximation). Estimates should be treated with caution if the
+group sizes are small.
 
-Releases will be published on [CRAN](http://cran.r-project.org/package=mclogit). Pre-releases will be available here in source and binary form. To install from the sources on GitHub you can use `devtools::install_github("melff/mclogit",subdir="pkg")`.
 
-More information about the package can be found at [my site](http://www.elff.eu/software/mclogit/)
diff --git a/pkg/DESCRIPTION b/pkg/DESCRIPTION
index 572f837..b51c251 100644
--- a/pkg/DESCRIPTION
+++ b/pkg/DESCRIPTION
@@ -2,7 +2,7 @@ Package: mclogit
 Type: Package
 Title: Multinomial Logit Models, with or without Random Effects or Overdispersion
 Version: 0.9.8
-Date: 2023-04-05
+Date: 2023-12-27
 Author: Martin Elff
 Maintainer: Martin Elff <mclogit@elff.eu>
 Description: Provides estimators for multinomial logit models in their
@@ -24,6 +24,6 @@ Suggests:
 Enhances:
     emmeans
 LazyLoad: Yes
-URL: http://mclogit.elff.eu,https://github.com/melff/mclogit/
+URL: http://melff.github.io/mclogit/,https://github.com/melff/mclogit/
 BugReports: https://github.com/melff/mclogit/issues
 RoxygenNote: 7.1.2
diff --git a/pkg/pkgdown/_pkgdown.yml b/pkg/pkgdown/_pkgdown.yml
index c7f235a..59cfd5b 100644
--- a/pkg/pkgdown/_pkgdown.yml
+++ b/pkg/pkgdown/_pkgdown.yml
@@ -1,4 +1,17 @@
 url: https://melff.github.io/mclogit
 template:
-  bootstrap: 5
-  bootswatch: sandstone
+  package: preferably
+articles:
+- title: The statistical models
+  navbar: The statistical models
+  contents:
+    - conditional-logit
+    - baseline-logit
+    - baseline-and-conditional-logit
+    - random-effects
+- title: Technical aspects of model fitting
+  navbar: Technical aspects of model fitting
+  contents:
+    - fitting-mclogit
+    - approximations
+  
diff --git a/pkg/vignettes/approximations.Rmd b/pkg/vignettes/approximations.Rmd
new file mode 100644
index 0000000..f5b1642
--- /dev/null
+++ b/pkg/vignettes/approximations.Rmd
@@ -0,0 +1,332 @@
+---
+title: Approximate Inference for Multinomial Logit Models with Random Effects
+output: rmarkdown::html_vignette
+vignette: >
+  % \VignetteIndexEntry{Approximate Inference for Multinomial Logit Models with Random Effects}
+  % \VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+# The problem
+
+A crucial problem for inference about non-linear models with random
+effects is that the likelihood function for such models involves
+integrals for which no analytical solution exists.
+
+For given values $\boldsymbol{b}$ of the random effects the likelihood
+function of a conditional logit model (and therefore also of a
+baseline-logit model) can be written in the form
+
+$$
+\mathcal{L}_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})
+=
+\exp\left(\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\right)
+=\exp
+\left(
+\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha})
+-\frac12\ln\det(\boldsymbol{\Sigma})
+-\frac12\boldsymbol{b}'\boldsymbol{\Sigma}^{-1}\boldsymbol{b}
+\right)
+$$
+
+However, this "complete data" likelihood function cannot be used for
+inference, because it depends on the unobserved random effects. To
+arrive at a likelihood function that depends only on observed data, one
+needs to used the following integrated likelihood function:
+
+$$
+\mathcal{L}_{\text{obs}}(\boldsymbol{y})
+=
+\int
+\exp\left(\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\right)
+\partial \boldsymbol{b}
+=
+\int
+\exp
+\left(
+\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha})
+-\frac12\ln\det(\boldsymbol{\Sigma})
+-\frac12\boldsymbol{b}'\boldsymbol{\Sigma}^{-1}\boldsymbol{b}
+\right)
+\partial \boldsymbol{b}
+$$
+
+In general, this integral cannot be "solved", i.e. eliminated from the
+formula by analytic means (it is "analytically untractable"). Instead,
+one will compute it either using numeric techniques (e.g. using
+numerical quadrature) or approximate it using analytical techniques.
+Unless there is only a single level of random effects numerical
+quadrature can become computationally be demanding, that is, the
+computation of the (log-)likelihood function and its derivatives can
+take a lot of time even on modern, state-of-the-art computer hardware.
+Yet approximations based on analytical techniques hand may lead to
+biased estimates in particular in samples where the number of
+observations relative to the number of random offects is small, but at
+least they are much easier to compute and sometimes making inference
+possible after all.
+
+The package "mclogit" supports to kinds of analytical approximations,
+the Laplace approximation and what one may call the Solomon-Cox
+appoximation. Both approximations are based on a quadratic expansion of
+the integrand so that the thus modified integral does have a closed-form
+solution, i.e. is analytically tractable.
+
+# The Laplace approximation and PQL
+
+## Laplace approximation
+
+The (first-order) Laplace approximation is based on the quadratic
+expansion the logarithm of the integrand, the complete-data
+log-likelihood
+
+$$
+\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\approx
+\ell(\boldsymbol{y}|\tilde{\boldsymbol{b}};\boldsymbol{\alpha})
+-
+\frac12
+(\boldsymbol{b}-\tilde{\boldsymbol{b}})'
+\tilde{\boldsymbol{H}}
+(\boldsymbol{b}-\tilde{\boldsymbol{b}})
+-\frac12\ln\det(\boldsymbol{\Sigma})
+-\frac12(\boldsymbol{b}-\tilde{\boldsymbol{b}})'\boldsymbol{\Sigma}^{-1}(\boldsymbol{b}-\tilde{\boldsymbol{b}})
+$$
+
+where $\tilde{\boldsymbol{b}}$ is the solution to
+
+$$
+\frac{\partial\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})}{\partial\boldsymbol{b}}
+= 0
+$$
+
+and $\tilde{\boldsymbol{H}}=\boldsymbol{H}(\tilde{\boldsymbol{b}})$ is the
+value of the negative Hessian with respect to $\boldsymbol{b}$
+
+$$
+\boldsymbol{H}(\boldsymbol{b})=-\frac{\partial^2\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha})}{\partial\boldsymbol{b}\partial\boldsymbol{b}'}
+$$
+
+for $\boldsymbol{b}=\tilde{\boldsymbol{b}}$.
+
+Since this quadratic expansion---let us call it
+$\ell^*_{\text{Lapl}}(\boldsymbol{y},\boldsymbol{b})$---is a
+(multivariate) quadratic function of $\boldsymbol{b}$, the integral of
+its exponential does have a closed-form solution (the relevant formula
+can be found in `harville:matrix.algebra`).
+
+For purposes of estimation, the resulting approximate log-likelihood is
+more useful:
+
+$$
+\ell^*_{\text{Lapl}}
+=
+\ln\int \exp(\ell_{\text{Lapl}}(\boldsymbol{y},\boldsymbol{b})) \partial\boldsymbol{b}
+=
+\ell(\boldsymbol{y}|\tilde{\boldsymbol{b}};\boldsymbol{\alpha})
+-
+\frac12\tilde{\boldsymbol{b}}'\boldsymbol{\Sigma}^{-1}\tilde{\boldsymbol{b}}
+-
+\frac12\ln\det(\boldsymbol{\Sigma})
+-
+\frac12\ln\det\left(\tilde{\boldsymbol{H}}+\boldsymbol{\Sigma}^{-1}\right).
+$$
+
+## Penalized quasi-likelihood (PQL)
+
+If one disregards the dependence of $\tilde{\boldsymbol{H}}$ on
+$\boldsymbol{\alpha}$ and $\boldsymbol{b}$, then
+$\tilde{\boldsymbol{b}}$ maximizes not only
+$\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})$ but also
+$\ell^*_{\text{Lapl}}$. This motivates the following IWLS/Fisher
+scoring equations for $\hat{\boldsymbol{\alpha}}$ and
+$\tilde{\boldsymbol{b}}$ (see
+`breslow.clayton:approximate.inference.glmm` and [this
+page](fitting-mclogit.html)):
+
+$$
+\begin{aligned}
+\begin{bmatrix}
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X} & \boldsymbol{X}'\boldsymbol{W}\boldsymbol{Z} \\
+\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{X} & \boldsymbol{Z}'\boldsymbol{W}\boldsymbol{Z} + \boldsymbol{\Sigma}^{-1}\\
+\end{bmatrix}
+\begin{bmatrix}
+ \hat{\boldsymbol{\alpha}}\\
+ \tilde{\boldsymbol{b}}\\
+\end{bmatrix}
+ =
+\begin{bmatrix}
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*\\
+\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{y}^*
+\end{bmatrix}
+\end{aligned}
+$$
+
+where
+
+$$
+\boldsymbol{y}^* =  \boldsymbol{X}\boldsymbol{\alpha} +
+\boldsymbol{Z}\boldsymbol{b} +
+\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{\pi})
+$$
+
+is the IWLS "working dependend variable" with $\boldsymbol{\alpha}$,
+$\boldsymbol{b}$, $\boldsymbol{W}$, and $\boldsymbol{\pi}$ computed in
+an earlier iteration.
+
+Substitutions lead to the equations:
+
+$$
+(\boldsymbol{X}\boldsymbol{V}^-\boldsymbol{X})\hat{\boldsymbol{\alpha}} =
+\boldsymbol{X}\boldsymbol{V}^-\boldsymbol{y}^*
+$$
+
+and
+
+$$
+(\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{Z} +
+\boldsymbol{\Sigma}^{-1})\boldsymbol{b} =
+\boldsymbol{Z}'\boldsymbol{W}(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})
+$$
+
+which can be solved to compute $hat{\boldsymbol{\alpha}}$ and
+$\tilde{\boldsymbol{b}}$ (for given $\boldsymbol{\Sigma}$)
+
+Here
+
+$$
+\boldsymbol{V} =
+\boldsymbol{W}^-+\boldsymbol{Z}\boldsymbol{\Sigma}\boldsymbol{Z}'
+$$
+
+and
+
+$$
+\boldsymbol{V}^- = \boldsymbol{W}-
+\boldsymbol{W}\boldsymbol{Z}'\left(\boldsymbol{Z}'\boldsymbol{W}\boldsymbol{Z}+\boldsymbol{\Sigma}^{-1}\right)^{-1}\boldsymbol{Z}\boldsymbol{W}
+$$
+
+Following `breslow.clayton:approximate.inference.glmm` the variance
+parameters in $\boldsymbol{Sigma}$ are estimated by minimizing
+
+$$
+q_1 =
+\det(\boldsymbol{V})+(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})\boldsymbol{V}^-(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})
+$$
+
+or the "REML" variant:
+
+$$
+q_2 =
+\det(\boldsymbol{V})+(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})\boldsymbol{V}^-(\boldsymbol{y}^*-\boldsymbol{X}\boldsymbol{\alpha})+\det(\boldsymbol{X}'\boldsymbol{V}^{-}\boldsymbol{X})
+$$
+
+This motivates the following algorithm, which is strongly inspired by
+the `glmmPQL()` function in Brian Ripley's *R* package
+[MASS](https://cran.r-project.org/package=MASS):
+
+1.  Create some suitable starting values for $\boldsymbol{\pi}$,
+    $\boldsymbol{W}$, and $\boldsymbol{y}^*$
+2.  Construct the "working dependent variable" $\boldsymbol{y}^*$
+3.  Minimize $q_1$ (quasi-ML) or $q_2$ (quasi-REML) iteratively
+    (inner loop), to obtain an estimate of $\boldsymbol{\Sigma}$
+4.  Obtain $hat{\boldsymbol{\alpha}}$ and $\tilde{\boldsymbol{b}}$ based
+    on the current estimate of $\boldsymbol{\Sigma}$
+5.  Compute updated $\boldsymbol{\eta}=\boldsymbol{X}\boldsymbol{\alpha} +
+    \boldsymbol{Z}\boldsymbol{b}$, $\boldsymbol{\pi}$, $\boldsymbol{W}$.
+6.  If the change in $\boldsymbol{\eta}$ is smaller than a given
+    tolerance criterion stop the algorighm and declare it as converged.
+    Otherwise go back to step 2 with the updated values of
+    $\hat{\boldsymbol{\alpha}}$ and $\tilde{\boldsymbol{b}}$.
+
+This algorithm is a modification of the [IWLS](fitting-mclogit.html)
+algorithm used to fit conditional logit models without random effects.
+Instead of just solving a linear requatoin in step 3, it estimates a
+weighted linear mixed-effects model. In contrast to `glmmPQL()` it does
+not use the `lme()` function from package
+[nlme](https://cran.r-project.org/package=nlme) for this, because the
+weighting matrix $\boldsymbol{W}$ is non-diagonal. Instead, $q_1$ or
+$q_2$ are minimized using the function `nlminb` from the standard *R*
+package "stats".
+
+# The Solomon-Cox approximation and MQL
+
+## The Solomon-Cox approximation
+
+The (first-order) Solomon approximation is based on the quadratic
+expansion the integrand
+
+$$
+\ell_{\text{cpl}}(\boldsymbol{y},\boldsymbol{b})\approx
+\ell(\boldsymbol{y}|\boldsymbol{0};\boldsymbol{\alpha})
++
+\boldsymbol{g}_0'
+\boldsymbol{b}
+-
+\frac12
+\boldsymbol{b}'
+\boldsymbol{H}_0
+\boldsymbol{b}
+-\frac12\ln\det(\boldsymbol{\Sigma})
+-\frac12\boldsymbol{b}'\boldsymbol{\Sigma}^{-1}\boldsymbol{b}
+$$
+
+where $\boldsymbol{g}\_0=\boldsymbol{g}(\boldsymbol{0})$ is the gradient
+of $\ell(\boldsymbol{y}\|\boldsymbol{b};\boldsymbol{\alpha})$
+
+$$
+\boldsymbol{g}(\boldsymbol{b})=-\frac{\partial\ell(\boldsymbol{y}|\boldsymbol{b};\boldsymbol{\alpha})}{\partial\boldsymbol{b}}
+$$
+
+at $\boldsymbol{b}=\boldsymbol{0}$, while
+$\boldsymbol{H}\_0=\boldsymbol{H}(\boldsymbol{0})$ is the negative
+Hessian at $\boldsymbol{b}=\boldsymbol{0}$.
+
+Like before, the integral of the exponential this quadratic expansion
+(which we refer to as
+$\ell_{\text{SC}}(\boldsymbol{y},\boldsymbol{b})$) has a closed-form
+solution, as does its logarithm, which is:
+
+$$
+\ln\int \exp(\ell_{\text{SC}}(\boldsymbol{y},\boldsymbol{b})) \partial\boldsymbol{b}
+=
+\ell(\boldsymbol{y}|\boldsymbol{0};\boldsymbol{\alpha})
+-
+\frac12\boldsymbol{g}_0'\left(\boldsymbol{H}_0+\boldsymbol{\Sigma}^{-1}\right)^{-1}\boldsymbol{g}_0
+-
+\frac12\ln\det(\boldsymbol{\Sigma})
+-
+\frac12\ln\det\left(\boldsymbol{H}_0+\boldsymbol{\Sigma}^{-1}\right).
+$$
+
+## Marginal quasi-likelhood (MQL)
+
+The resulting estimation technique is very similar to PQL (again, see
+`breslow.clayton:approximate.inference.glmm` for a discussion). The only
+difference is the construction of the "working dependent" variable
+$\boldsymbol{y}^*$. With PQL it is constructed as 
+$$\boldsymbol{y}^* =
+\boldsymbol{X}\boldsymbol{\alpha} + \boldsymbol{Z}\boldsymbol{b} +
+\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{pi})$$
+while the MQL working
+dependent variable is just
+
+$$
+\boldsymbol{y}^* =  \boldsymbol{X}\boldsymbol{\alpha} +
+\boldsymbol{W}^{-}(\boldsymbol{y}-\boldsymbol{\pi})
+$$
+
+so that the algorithm has the following steps:
+
+1.  Create some suitable starting values for $\boldsymbol{\pi}$,
+    $\boldsymbol{W}$, and $\boldsymbol{y}^*$
+2.  Construct the "working dependent variable" $\boldsymbol{y}^*$
+3.  Minimize $q_1$ (quasi-ML) or $q_2$ (quasi-REML) iteratively
+    (inner loop), to obtain an estimate of $\boldsymbol{\Sigma}$
+4.  Obtain $\hat{\boldsymbol{\alpha}}$ based on the current estimate of
+    $\boldsymbol{\Sigma}$
+5.  Compute updated $\boldsymbol{\eta}=\boldsymbol{X}\boldsymbol{\alpha}$,
+    $\boldsymbol{\pi}$, $\boldsymbol{W}$.
+6.  If the change in $\boldsymbol{\eta}$ is smaller than a given
+    tolerance criterion stop the algorighm and declare it as converged.
+    Otherwise go back to step 2 with the updated values of
+    $\hat{\boldsymbol{\alpha}}$.
+
diff --git a/pkg/vignettes/baseline-and-conditional-logit.Rmd b/pkg/vignettes/baseline-and-conditional-logit.Rmd
new file mode 100644
index 0000000..fb9b0ee
--- /dev/null
+++ b/pkg/vignettes/baseline-and-conditional-logit.Rmd
@@ -0,0 +1,81 @@
+---
+title: The relation between baseline logit and conditional logit models
+output: rmarkdown::html_vignette
+vignette: >
+  % \VignetteIndexEntry{The relation between baseline logit and conditional logit models}
+  % \VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+Baseline-category logit models can be expressed as particular form of
+conditional logit models. In a conditional logit model (without random
+effects) the probability that individual $i$ chooses alternative $j$
+from choice set $\mathcal{S}_i$ is
+
+$$
+\pi_{ij} = \frac{\exp(\eta_{ij})}{\sum_{k\in\mathcal{S}_i}\exp(\eta_{ik})}
+$$
+
+where
+
+$$
+\eta_{ij} = \alpha_1x_{1ij}+\cdots+\alpha_qx_{qij}
+$$
+
+In a baseline-category logit model, the set of alternatives is the same
+for all individuals $i$ that is $\mathcal{S}_i = {1,\ldots,q}$ and
+the linear part of the model can be written like:
+
+$$
+\eta_{ij} = \beta_{j0}+\beta_{j1}x_{i1}+\cdots+\beta_{jr}x_{ri}
+$$
+
+where the coefficients in the equation for baseline category $j$ are
+all zero, i.e.
+
+$$
+\beta_{10} = \cdots = \beta_{1r} = 0
+$$
+
+After setting
+
+$$
+\begin{aligned}
+x_{(g\times(j-1))ij} = d_{gj}, \quad
+x_{(g\times(j-1)+h)ij} = d_{gj}x_{hi}, \qquad
+\text{with }d_{gj}=
+\begin{cases}
+0&\text{for } j\neq g\text{ or } j=g\text{ and } j=0\\
+1&\text{for } j=g \text{ and } j\neq0\\
+\end{cases}
+\end{aligned}
+$$
+
+we have for the log-odds:
+
+$$
+\begin{aligned}
+\begin{aligned}
+\ln\frac{\pi_{ij}}{\pi_{i1}}
+&=\beta_{j0}+\beta_{ji}x_{1i}+\cdots+\beta_{jr}x_{ri}
+\\
+&=\sum_{h}\beta_{jh}x_{hi}=\sum_{g,h}\beta_{jh}d_{gj}x_{hi}
+=\sum_{g,h}\alpha_{g\times(j-1)+h}(d_{gj}x_{hi}-d_{g1}x_{hi})
+=\sum_{g,h}\alpha_{g\times(j-1)+h}(x_{(g\times(j-1)+h)ij}-x_{(g\times(j-1)+h)i1})\\
+&=\alpha_{1}(x_{1ij}-x_{1i1})+\cdots+\alpha_{s}(x_{sij}-x_{si1})
+\end{aligned}
+\end{aligned}
+$$
+
+where $\alpha_1=\beta_{21}$, $\alpha_2=\beta_{22}$, etc.
+
+That is, the baseline-category logit model is translated into a
+conditional logit model where the alternative-specific values of the
+attribute variables are interaction terms composed of
+alternativ-specific dummes and individual-specific values of
+characteristics variables.
+
+Analogously, the random-effects extension of the baseline-logit model
+can be translated into a random-effects conditional logit model where
+the random intercepts in the logit equations of the baseline-logit model
+are translated into random slopes of category-specific dummy variables.
diff --git a/pkg/vignettes/baseline-logit.Rmd b/pkg/vignettes/baseline-logit.Rmd
new file mode 100644
index 0000000..0166001
--- /dev/null
+++ b/pkg/vignettes/baseline-logit.Rmd
@@ -0,0 +1,63 @@
+---
+title: Baseline-category logit models
+output: rmarkdown::html_vignette
+vignette: >
+  % \VignetteIndexEntry{Baseline-category logit models}
+  % \VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+
+Multinomial baseline-category logit models are a generalisation of
+logistic regression, that allow to model not only binary or dichotomous
+responses, but also polychotomous responses. In addition, they allow to
+model responses in the form of counts that have a pre-determined sum.
+These models are described in
+`agresti:categorical.data.analysis.2002`{.interpreted-text
+role="citet"}. Estimating these models is also supported by the function
+`multinom()` in the *R* package \"nnet\" `MASS`{.interpreted-text
+role="cite"}. In the package \"mclogit\", the function to estimate these
+models is called `mblogit()` (see the relevant [manual
+page](reference/mblogit.html)), which uses the infrastructure for estimating
+conditional logit models, exploiting the fact that baseline-category
+logit models can be re-expressed as condigional logit models.
+
+Baseline-category logit models are constructed as follows. Suppose a
+categorical dependent variable or response with categories
+$j=1,\ldots,q$ is observed for individuals $i=1,\ldots,n$. Let
+$\pi_{ij}$ denote the probability that the value of the dependent
+variable for individual $i$ is equal to $j$, then the
+baseline-category logit model takes the form:
+
+$$
+\begin{aligned}
+\pi_{ij} =
+\begin{cases}
+\dfrac{\exp(\alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri})}
+{1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})}
+& \text{for } j>1\\[20pt]
+\dfrac{1}
+{1+\sum_{k>1}\exp(\alpha_{k0}+\alpha_{k1}x_{1i}+\cdots+\alpha_{kr}x_{ri})}
+& \text{for } j=1
+\end{cases}
+\end{aligned}
+$$
+
+where the first category ($j=1$) is the baseline category.
+
+Equivalently, the model can be expressed in terms of log-odds, relative
+to the baseline-category:
+
+$$
+\ln\frac{\pi_{ij}}{\pi_{i1}}
+=
+\alpha_{j0}+\alpha_{j1}x_{1i}+\cdots+\alpha_{jr}x_{ri}.
+$$
+
+Here the relevant parameters of the model are the coefficients
+$\alpha_{jk}$ which describe how the values of independent variables
+(numbered $k=1,\ldots,r$) affect the relative chances of the response
+taking a value $j$ versus taking the value $1$. Note that there is
+one coefficient for each independent variable and *each response* other
+than the baseline category.
+
diff --git a/pkg/vignettes/conditional-logit.Rmd b/pkg/vignettes/conditional-logit.Rmd
new file mode 100644
index 0000000..0ed0e6b
--- /dev/null
+++ b/pkg/vignettes/conditional-logit.Rmd
@@ -0,0 +1,59 @@
+---
+title: Conditional logit models
+output: rmarkdown::html_vignette
+vignette: >
+  % \VignetteIndexEntry{Conditional logit models}
+  % \VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+Conditional logit models are motivated by a variety of considerations,
+notably as a way to model binary panel data or responses in
+case-control-studies. The variant supported by the package "mclogit"
+is motivated by the analysis of discrete choices and goes back to
+`mcfadden:conditional.logit`{.interpreted-text role="citet"}. Here, a
+series of individuals $i=1,ldots,n$ is observed to have made a choice
+(represented by a number $j$) from a choice set $\mathcal{S}_i$, the
+set of alternatives at the individual's disposal. Each alternatives
+$j$ in the choice set can be described by the values
+$x_{1ij},\ldots,x_{1ij}$ of $r$ attribute variables (where the
+variables are enumerated as $i=1,\ldots,r$). (Note that in contrast to
+the baseline-category logit model, these values vary between choice
+alternatives.) Conditional logit models then posit that individual $i$
+chooses alternative $j$ from his or her choice set $\mathcal{S}_i$
+with probability
+
+$$
+\pi_{ij} = \frac{\exp(\alpha_1x_{1ij}+\cdots+\alpha_rx_{rij})}
+                {\sum_{k\in\mathcal{S}_i}\exp(\alpha_1x_{1ik}+\cdots+\alpha_rx_{rik})}.
+$$
+
+It is worth noting that the conditional logit model does not require
+that all individuals face the same choice sets. Only that the
+alternatives in the choice sets can be distinguished from one another by
+the attribute variables.
+
+The similarities and differences of these models to baseline-category
+logit model becomes obvious if one looks at the log-odds relative to the
+first alternative in the choice set:
+
+$$
+\ln\frac{\pi_{ij}}{\pi_{i1}}
+=
+\alpha_{1}(x_{1ij}-x_{1i1})+\cdots+\alpha_{r}(x_{rij}-x_{ri1}).
+$$
+
+Conditional logit models appear more parsimonious than baseline-category
+logit models in so far as they have only one coefficient for each
+independent variables.[^1] In the "mclogi\" package, these models can
+be estimated using the function `mclogit()` (see the relevant [manual
+page](reference/mclogit.html)).
+
+My interest in conditional logit models derives from my research into
+the influence of parties\' political positions on the patterns of
+voting. Here, the political positions are the attributes of the
+alternatives and the choice sets are the sets of parties that run
+candidates in a countries at various points in time. For the application
+of the conditional logit models, see my doctoral thesis
+`elff:politische.ideologien`{.interpreted-text role="cite"}.
+
diff --git a/pkg/vignettes/fitting-mclogit.Rmd b/pkg/vignettes/fitting-mclogit.Rmd
new file mode 100644
index 0000000..0351c7d
--- /dev/null
+++ b/pkg/vignettes/fitting-mclogit.Rmd
@@ -0,0 +1,202 @@
+---
+title: The IWLS algorithm used to fit conditional logit models
+output: rmarkdown::html_vignette
+vignette: >
+  % \VignetteIndexEntry{The IWLS algorithm used to fit conditional logit models}
+  % \VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+The package "mclogit" fits conditional logit models using a maximum
+likelihood estimator. It does this by maximizing the log-likelihood
+function using an *iterative weighted least-squares* (IWLS) algorithm,
+which follows the algorithm used by the `glm.fit()` function from the
+"stats" package of *R*.
+
+If $\pi_{ij}$ is the probability that individual $i$ chooses
+alternative $j$ from his/her choice set $\mathcal{S}_i$, where
+
+$$
+\pi_{ij}=\frac{\exp(\eta_{ij})}{\sum_k{\in\mathcal{S}_i}\exp(\eta_{ik})}
+$$
+
+and if $y_{ij}$ is the dummy variable with equals 1 if individual
+$i$ chooses alternative $j$ and equals 0 otherwise, the
+log-likelihood function (given that the choices are identically
+independent distributed given $\pi_{ij}$) can be written as
+
+$$
+\ell=\sum_{i,j}y_{ij}\ln\pi_{ij}
+    =\sum_{i,j}y_{ij}\eta_{ij}-\sum_i\ln\left(\sum_j\exp(\eta_{ij})\right)
+$$
+
+If the data are aggregated in the terms of counts such that
+$n_{ij}$ is the number of individuals with the same choice set and
+the same choice probabilities $\pi_{ij}$ that have chosen
+alternative $j$, the log-likelihood is (given that the choices are
+identically independent distributed given $\pi_{ij}$)
+
+$$
+\ell=\sum_{i,j}n_{ij}\ln\pi_{ij}
+    =\sum_{i,j}n_{ij}\eta_{ij}-\sum_in_{i+}\ln\left(\sum_j\exp(\eta_{ij})\right)
+$$
+
+where $n_{i+}=\sum_{j\in\mathcal{S}_i}n_{ij}$.
+
+If
+
+$$
+\eta_{ij} =
+\alpha_1x_{1ij}+\cdots+\alpha_rx_{rij}=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}
+$$
+
+then the gradient of the log-likelihood with respect to the coefficient
+vector $\boldsymbol{\alpha}$ is
+
+$$
+\frac{\partial\ell}{\partial\boldsymbol{\alpha}}
+=
+\sum_{i,j}
+\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}}
+\frac{\partial\ell}{\partial\eta_{ij}}
+=
+\sum_{i,j}
+\boldsymbol{x}_{ij}
+(n_{ij}-n_{i+}\pi_{ij})
+=
+\sum_{i,j}
+\boldsymbol{x}_{ij}
+n_{i+}
+(y_{ij}-\pi_{ij})
+=
+\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
+$$
+
+and the Hessian is
+
+$$
+\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'}
+=
+\sum_{i,j}
+\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}}
+\frac{\partial\eta_{ij}}{\partial\boldsymbol{\alpha}'}
+\frac{\partial\ell^2}{\partial\eta_{ij}^2}
+=
+-
+\sum_{i,j,k}
+\boldsymbol{x}_{ij}
+n_{i+}
+(\delta_{jk}-\pi_{ij}\pi_{ik})
+\boldsymbol{x}_{ij}'
+=
+-
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
+$$
+
+Here $y_{ij}=n_{ij}/n_{i+}$, while
+$boldsymbol{N}$ is a diagonal matrix with diagonal elements
+$n_{i+}$.
+
+Newton-Raphson iterations then take the form
+
+$$
+\boldsymbol{\alpha}^{(s+1)}
+=
+\boldsymbol{\alpha}^{(s)}
+-
+\left(
+\frac{\partial^2\ell}{\partial\boldsymbol{\alpha}\partial\boldsymbol{\alpha}'}
+\right)^{-1}
+\frac{\partial\ell}{\partial\boldsymbol{\alpha}}
+=
+\boldsymbol{\alpha}^{(s)}
++
+\left(
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
+\right)^{-1}
+\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
+$$
+
+where $\boldsymbol{\pi}$ and $\boldsymbol{W}$ are evaluated at
+$\boldsymbol{\alpha}=\boldsymbol{\alpha}^{(s)}$.
+
+Multiplying by $\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}$ gives
+
+$$
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
+\boldsymbol{\alpha}^{(s+1)}
+=
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
+\boldsymbol{\alpha}^{(s)}
++
+\boldsymbol{X}'\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})
+=
+\boldsymbol{X}'\boldsymbol{W}
+\left(\boldsymbol{X}\boldsymbol{\alpha}^{(s)}+\boldsymbol{W}^-\boldsymbol{N}(\boldsymbol{y}-\boldsymbol{\pi})\right)
+=
+\boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*
+$$
+
+where $\boldsymbol{W}^-$ is a generalized inverse of $\boldsymbol{W}$
+and $\boldsymbol{y}^*$ is a "working response vector" with elements
+
+$$
+y_{ij}^*=\boldsymbol{x}_{ij}'\boldsymbol{\alpha}^{(s)}+\frac{y_{ij}-\pi_{ij}}{\pi_{ij}}
+$$
+
+The IWLS algorithm thus involves the following steps:
+
+1.  Create some suitable starting values for $\boldsymbol{\pi}$,
+    $\boldsymbol{W}$, and $\boldsymbol{y}^*$
+
+2.  Construct the "working dependent variable" $\boldsymbol{y}^*$
+
+3.  Solve the equation
+
+    $$
+    \boldsymbol{X}'\boldsymbol{W}\boldsymbol{X}
+    \boldsymbol{\alpha}
+    =
+    \boldsymbol{X}'\boldsymbol{W}\boldsymbol{y}^*
+    $$
+
+    for $\boldsymbol{\alpha}$.
+
+4.  Compute updated $\boldsymbol{\eta}$, $\boldsymbol{\pi}$,
+    $\boldsymbol{W}$, and $\boldsymbol{y}^*$.
+
+5.  Compute the updated value for the log-likelihood or the deviance
+
+    $$
+    d=2\sum_{i,j}n_{ij}\ln\frac{y_{ij}}{\pi_{ij}}
+    $$
+
+6.  If the decrease of the deviance (or the increase of the
+    log-likelihood) is smaller than a given tolerance criterian
+    (typically $\Delta d \leq 10^{-7}$) stop the algorighm and declare
+    it as converged. Otherwise go back to step 2 with the updated value
+    of $\boldsymbol{\alpha}$.
+
+The starting values for the algorithm used by the *mclogit* package are
+constructe as follows:
+
+1.  Set
+
+    $$
+    \eta_{ij}^{(0)} = \ln (n_{ij}+\tfrac12)
+                    - \frac1{q_i}\sum_{k\in\mathcal{S}_i}\ln (n_{ij}+\tfrac12)
+    $$
+
+    (where $q_i$ is the size of the choice set $\mathcal{S}_i$)
+
+2.  Compute the starting values of the choice probabilities
+    $\pi_{ij}^{(0)}$ according to the equation at the beginning of
+    the page
+
+3.  Compute intial values of the working dependent variable according to
+
+    $$
+    y_{ij}^{*(0)}
+    =
+    \eta_{ij}^{(0)}+\frac{y_{ij}-\pi_{ij}^{(0)}}{\pi_{ij}^{(0)}}
+    $$
diff --git a/pkg/vignettes/random-effects.Rmd b/pkg/vignettes/random-effects.Rmd
new file mode 100644
index 0000000..7c53c2d
--- /dev/null
+++ b/pkg/vignettes/random-effects.Rmd
@@ -0,0 +1,73 @@
+---
+title: Random effects in baseline logit models and conditional logit models
+output: rmarkdown::html_vignette
+vignette: >
+  % \VignetteIndexEntry{Random effects in baseline logit models and conditional logit models}
+  % \VignetteEngine{knitr::rmarkdown}
+  %\VignetteEncoding{UTF-8}
+---
+
+The "mclogit" package allows for the presence of random effects in
+baseline-category logit and conditional logit models. In
+baseline-category logit models, the random effects may represent
+(unobserved) characteristics that are common the individuals in
+clusters, such as regional units or electoral districts or the like. In
+conditional logit models, random effects may represent attributes that
+share across several choice occasions within the same context of choice.
+That is, if one analyses voting behaviour across countries then an
+random effect specific to the Labour party may represent unobserved
+attributes of this party in terms of which it differs from (or is more
+like) the Social Democratic Party of Germany (SPD). My original
+motivation for working on conditional logit models with random effects
+was to make it possible to assess the impact of parties' political
+positions on the patterns of voting behaviour in various European
+countries. The results of this research are published in an article in
+*Electoral Studies* `elff:divisions.positions.voting`{.interpreted-text
+role="cite"}.
+
+In its earliest incarnation, the package supported only a very simple
+random-intercept extension of conditional logit models (or "mixed
+conditional logit models", hence the name of the package). These models
+can be written as
+
+$$
+\pi_{ij} = \frac{\exp(\eta_{ij})}{\sum_{k\in\mathcal{S}_i}\exp(\eta_{ik})}
+$$
+
+with
+
+$$
+\eta_{ij}=\sum_h\alpha_hx_{hij}+\sum_kz_{ik}b_{jk}
+$$
+
+where $x_{hij}$ represents values of independent variables, $\alpha_h$
+are coefficients, $z_{ik}$ are dummy ariables (that are equal to
+$1$ if $i$ is in cluster $k$ and equal to $0$ otherwise),
+$b_{jk}$ are random effects with a normal distribution with expectation
+$0$ and variance parameter $\sigma^2$.
+
+Later releases also added support for baseline-category logit models
+(initially only without random effects). In order to support random
+effects in baseline-category logit models, the package had to be further
+modified to allow for conditional logit models with random slopes (this
+is so because baseline-categoy logit models can be expressed as a
+particular type of conditional logit models).
+
+It should be noted that estimating the parameters of random effects
+multinomial logit models (whether of baseline-category logit variety or
+the conditional logit variety) involves the considerable challenges
+already known from the "generalized linear mixed models" literature.
+The main challenge is that the likelihood function involves analytically
+intractable integrals (i.e. there is know way to "solve" or eliminate
+the intergrals from the formula of the likelihood function). This means
+that either computationally intensive methods for the computation of
+such integrals have to be used or certain approximations (most notably
+the Laplace approximation technique and its variants), which may lead to
+biases in certain situations. The "mclogit" package only supports
+approximate likelihood-based inference. Most of the time the
+PQL-technique based on a (first-order) Laplace approximation was
+supported, release 0.8, "mclogit" also supports the MQL technique,
+which is based on a (first-order) Solomon-Cox approximation. The ideas
+behind the PQL and MQL techniques are described e.g. in
+`breslow.clayton:approximate.inference.glmm`{.interpreted-text
+role="citet"}.