Mixture Model Parameter Estimation
Visser & Speekenbrink: Sec 2.3
Maximum Likelihood Estimator
MLE is defined as the parameters that maximize the likelihood function defined in pthe revious chapter given the observation (treating observation as fixed)
Label Switching
In a mixture model, the components can permute and still have the same distribution. (e.g.: If there are two normal distribution states, just swap the parameters for state 1 with parameters for state 2). It has more severe consequences when using Bayesian parameter estimation and inference techniques, and bootstrapping. In that case, relabeling components to make them consistent over iterations becomes necessary.
Method 1: Numerical Optimization of the Likelihood
Numerical optimization can iteratively find the minimum of a function. Look at the below trivial example
GaussMix2 <- function(par, y){
# set parameter values
p1 <- par[1]; p2 <- 1 - par[1]
m1 <- par[2]; m2 <- par[3]; s1 <- par[4]; s2 <- par[5]
# return negative log-likelihood
-sum(log(p1 * dnorm(y, mean = m1, sd = s1) + p2 * dnorm(y, mean = m2, sd = s2)))
}
# define starting values
pstart <- c("p1"=.5,"m1"=1,"m2"=2,"s1"=.5,"s2"=.5)
# create a fake mixture Gaussian
y_obs <- c(rnorm(1000,0,1), rnorm(1000,5,2))
# run optim
opt <- optim(par=pstart,fn=GaussMix2,y=y_obs,lower=c(0,-Inf,-Inf,0,0),upper=c(1,rep(Inf,4)),method="L-BFGS-B")
# read output
opt$par
# p1 m1 m2 s1 s2
# 0.50854228 0.03480772 5.02649332 0.98800874 2.03568627
GaussMix2 is a function that takes in parameters and observation y. It then finds the minimum of the negative log-likelihood:
optim arguments passed are
par = pstart: Initial values for the parameters to be optimized over
fn = GaussMix2: The function to be minimized. The first argument is the vector of parameters that needs to be minimized
y=y_obs: the additional argument required for fn
The requirement of mixing probabilities has to be between 0 and 1 and positive standard deviation is enforced through lower and upper arguments
Expectation Maximization (EM) Algorithm
EM Goal and Use Canse:
Parameter estimation for problems with missing or latent data and maximum likelihood estimation would be easy if we knew the missing or latent values.
EM Idea
Impute expected values for the latent data/states and then estimate parameters by optimizing the joint likelihood of the parameters given the observation and imputed latent states. Then we can separately perform the following step
Expectation step: (Re)compute the expectation of the complete-data log likelihood (The likelihood of observed data + hidden states).
Maximization step: Expected complete-data log-likelihood is a function of parameters , but not the unobserved state . So we can find the optimal values of by maximizing the expected complete-data log-likelihood.
EM in Mixture Model
The joint log-likelihood function (complete-data log-likelihood) can be decomposed into two parts:
Compared to the model log-likelihood, the expression is the joint log-likelihood of data and the hidden states. The hidden states are normally unavailable in the data.
The first part is the probability of (the state at the time ) given the mixing probability parameter . The second part is the probability of (observation at time ) given the state at that time and the density parameters .
The expected value of complete-data log-likelihood w.r.t all possible hidden states across each observation () given the observations and initial/previous guess parameters :
The expectation is a function of the true parameter and the initial guess parameters
The first equal sign is applying the definition of expectation.
The second equal sign is applying the definition of complete joint log-likelihood.
Notice is just a shorthand for all possible permutations of hidden states from time 1 to T.
function is the posterior probabilities of the state given the initial/previous guess parameters.
So overall, the EM algorithm for the mixture model can be described as
Start with an initial set of parameters
Repeat until convergence:
Compute the posterior probabilities and the log-likelihood as described in the previous chapter.
Obtain new estimates
Set
The convergence can be checked by 1) the norm of the parameter guesses or 2) the relative increase in the log-likelihood. This can be done in R through package depmixS4.
In summary, EM algorithm for Mixture Class Model alternative between
computing the expected component membership probabilities or the posterior state probabilities
optimizing the response model parameters conditional on these expected component memberships
Handling Constraint
Linear (in)equality constraint can be represented as
And numerical optimization can be performed using packages such as Rsolnp, Rdonlp2, depmixS4
Starting Values
We have two choices, especially concerning EM algorithm.
Start with density parameter values for components (obtained through methods such as K-means) and equal value for mixing probability
Start with posterior probability (such as random assignment of each case to a component, resulting in 1 hot vector for posterior probability) and directly go to maximization step in the first iteration
Last updated