EM Algorithm and Mixtures of Generalized Linear Models
Generalized Linear Models (GLM)
Component of GLM
GLM assumes the observations follows an exponential distribution
: location and scale parameters
often is a known constant, such as for Binomial and Poisson distribution it equals 1. If it is unkown, it is estimated by method of moments. Note that and are orthogonal parameters, so their estimation does not affect each other
: known weights
: known functions
The mean and variance are
The distribution can also depend on predictors through a linear function such that the mean is assumed to be a smooth function of the linear combination of predictors
is also called the link function
If , we call this the canonical link function and
Model Likelihood
The log-likelihood for independent observations from an exponential family distribution is
to note again, likelihood function is not pdf. It is a function w.r.t parameters.
Estimate GLM Through Iteratively Weighted Least Squares
GLM model beta coeffcients can be estimated through Iteratively Weighted Least Squares, and it can be shown that this is equivalent to maximum likelihood estimates.
Start with an initial guess of and
Obtain working response
Obtain working weight
Estimate
: model matrix
: diagonal matrix with the working weights
: vector of working response
Obtain new estiamte
Repeat the steps until convergence
Connection to EM Algorithm and Mixture Model
Summary: We can use IWLS to maximize individual component density numerically if each component density of mixture model is in GLM family.
Recall in mixture model, we can define joint log-likelihood as a function of the data, hidden states, and parameters. The expectation of complete data log-likelihood over all possible hidden states is
Part of the EM process is then to obtain the parameters estiamte that maximizes this expected complete data log likelihood.
If each component density has a separate set of parameters, then they can be estimated separately. So the parameter for component can be estimated via maximizing the weighted likelihood:
Notice the tricky part of the EM for Mixture model is that it is not always possible to analytically obtain the actual maximum. But if the density function for component is from the generalized linear model family, we can rewrite the formula as
Note that only the first part impacts the estimation of , and subsequently , we can thus rely on IWLS routine using weights .
Further note that we cannot use IWLS to estimate since the wieghts are not the same ( vs. ). But generally is estimated through a method of meoments anyway.
Last updated