You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The term generalized refers to specifying different observational models from the exponential family, denoted by $F$, for the observations $y$. The models are linked to a predictor function $\eta$ by a specific link function $g(·)$,
where $\mu$ represents the mean of the model, $E[y | \phi] = \mu$, and $\phi$ represents the other parameters of the model, such as, for example, the variance parameter in normal models or the shape parameter in Gamma models.
The predictor function $\eta$, often also called linear predictor, is usually linked to the mean parameter $\mu$ of the model (although it might also be another parameter of the model) by a strictly monotonic link function $g$ with inverse mapping
$$
\mu = g^{−1}(\eta).
$$
Often, $g$ is a differentiable function in order to be able to obtain the maximum likelihood estimate conveniently.
The term linear model usually refers to using a parametric form for the functional relationship (predictor function $\eta$) between observed data and predictors (input variables), such that:
$$
\eta = \beta x,
$$
You might be more familiar with the terms 'feature' or 'independent variables' or 'covariate' other than 'predictor'. We will be using them interchangeably.
where $\beta$ is the row-vector of coefficients in a parametric model and $x$ is the column vector of predictors.
In the Bayesian framework, we will need to provide prior distributions for the model parameters $\mu$ and $\phi$. In the case of the mean parameter $\mu$, if we have a predictor function, we will define the priors in the parameters $\beta$ instead.
Normal model
The observational model is a Normal distribution with parameters mean $\mu$ and noise variance $\sigma^2$:
In this case, the link function is the log-function allowing to transform the values of the linear predictor $\eta$ in the continuous real space, to the strictly positive or equal to zero range of values of the mean of the Poisson model $\lambda = \exp(\eta)$.
Binomial model
In this case, the observations are binary $y \in (0,1)$. These binary observations are expected to follow a Binomial distribution, with probability parameter $p$:
In this case, the probability $p$ is linked to the predictor function $\eta$ through the ’logistic’ transformation, $\text{logit}(·)$, which transforms the values of the predictor function, usually in the continuous real space, to the [0,1] range of probabilities. In this model, the ’probit’ link function can also be used.
Categorical model
Categorical model is also called "multinomial".
In multi-class classification problems, the observations are multi-class-valued $(1, . . . , J )$. In this case, a multinomial observational model may be used:
where $p = (p_1,...,p_j,...,p_J)$ is a vector of probabilities of each possible class. In this model, in order the vector of probabilities $p$ of an observation to sum to 1, $\sum_{j=1}^J p_j = 1$.
The probability of belonging to a class $j$ can be computed by the ’softmax’ transformation