EM Algorithm and Mixtures of Generalized Linear Models
Generalized Linear Models (GLM)
Component of GLM
GLM assumes the observations follows an exponential distribution (for simplicity, here is a good Stack Exchange post about this, and a good notes from Ryan Tibshirani on GLM)
θt,ϕ : location and scale parameters
often ϕ is a known constant, such as for Binomial and Poisson distribution it equals 1. If it is unkown, it is estimated by method of moments. Note that θt and ϕ are orthogonal parameters, so their estimation does not affect each other
wt: known weights
b,c: known functions
The mean and variance are
The distribution can also depend on predictors xjthrough a linear function such that the mean is assumed to be a smooth function of the linear combination of predictors
h is also called the link function
If h=(b′)−1, we call this the canonical link function and θt=ηt
Model Likelihood
The log-likelihood for T independent observations from an exponential family distribution is
to note again, likelihood function is not pdf. It is a function w.r.t parameters.
Estimate GLM Through Iteratively Weighted Least Squares
GLM model beta coefficients can be estimated through Iteratively Weighted Least Squares, and it can be shown that this is equivalent to maximum likelihood estimates.
Start with an initial guess of ηt^ and μt^=h(ηt^)
Obtain working response zt=η^t+∂ηt∂μtyt−μ^t
Obtain working weight Wt=b′′(θ^t)wt(∂ηt∂μt)2
Estimate β^=(XTWX)−1XTWz
X: model matrix
W: diagonal matrix with the working weights
z: vector of working response
Obtain new estimate η^=Xβ^
Repeat the steps until convergence
Connection to EM Algorithm and Mixture Model
Summary: We can use IWLS to maximize individual component density numerically if each component density of mixture model is in GLM family.
Recall in mixture model, we can define joint log-likelihood as a function of the data, hidden states, and parameters. The expectation of complete data log-likelihood over all possible hidden states is
Part of the EM process is then to obtain the parameters estiamte that maximizes this expected complete data log likelihood.
If each component density has a separate set of parameters, then they can be estimated separately. So the parameter for component i can be estimated via maximizing the weighted likelihood:
Notice the tricky part of the EM for Mixture model is that it is not always possible to analytically obtain the actual maximum. But if the density function for component i is from the generalized linear model family, we can rewrite the formula as
Note that only the first part impacts the estimation of β, and subsequently θ, we can thus rely on IWLS routine using weights wt′.
Further note that we cannot use IWLS to estimate ϕ since the wieghts are not the same (wt′ vs. wt). But generally ϕ is estimated through a method of meoments anyway.
Last updated