Mixture Model Parameter Inference
Visser & Speekenbrink: Sec 2.4, 2.5
Likelihood Ratio Test
The likelihood ratio test is used to test constraints on the parameter space. The statistics is defined as
is the MLE parameters of the unconstrained model, and is the MLE parameters of the constrained model.
The distribution assumes
Constrained model estimates are the true parameters
Under some regularity conditions
True parameter vectors is an interior point of the parameter space (cannot lie on the boundary, such as standard deviation or mixing probability equals 0)
If the constrained model estimates are the true parameters, then under some regularity conditions, the statistics is asymptotically chi-squared distributed with degrees of freedom
Code Example
In depmixS4, the likelihood ratio can be performed using llratio function
// Some code
# Create and fit mixture model
spmix2 <- mix(RT~1, data = speed1, nstates = 2)
fspmix2 <- fit(spmix2, verbose = FALSE)
pst <- getpars(fspmix2)
pst[c(4,6)] <- 0.25 # manual set some parameters to create a constrained model
spm2 <- setpars(spmix2, pst)
fsmix2c <- fit(spm2, equal = c(1,1,1,2,1,2), method = "rsolnp")
# The equal argument is used to specify equality constraints:
## parameters with same integer number in this vector are estimated to be equal
## except 0 and 1, which indicates free and fixed parameter.
llratio(fspmix2, fspmix2c)
Standard Errors and Hessian Matrix
Under regularity conditions, the MLE estimates asymptotically follow a multivariate Gaussian distribution
is the number of observation
is the Fisher information matrix. This can be approximated with the observed Fisher information matrix
The last line shows that the actual variance matrix of the multivariate Gaussain can be approximated with the inverse of the Hessian matrix of the minus log-likelihood evaluated at estimated parameters
This means that if we know the Hessian matrix , the standard error of the parameters estimate can be obtained via the square root of the diagonal elements of the inverse.
Obtain Hessian as part of numerical optimization
If the log-likelihood is optimized using direct numerical optimization, the Hessian can be estimated during the optimization. For example:
# estimate
nlmEst <- nlm(GaussMix2, pstart, y = y, hessian = TRUE)
# GaussMix2 is a function that takes in parameter value and y, and then reutrn the negative log-likelihood
# obtain hessian
hess1<- nlmEst$hessian
# get square-root of the diagonal of the inverse
nlmSes <- sqrt(diag(solve(hess1)))
Obtain Hessian via Finite Difference Approximation
require(numDeriv)
opt <- optim(pstart, fn = GaussMix2, y = speed1$RT, lower = c(0, - Inf, -Inf, 0, 0), upper = c(1, rep(Inf, 4)), method = "L-BFGS-B")
hess2 <- hessian(GaussMix2, opt$par, y=y)
require(nlme)
hess3 <- fdHess(opt$par, GaussMix2, y = y)$Hessian
# here par can be the estimated parameter from optim
depmixS4 provides vcov(), confit(), standardError() methods for variance-covarince, confidence interval, and standard error.
Obtain Hessia via Parametric Bootstrap
If it is possible to simulate data from the fitted mixture model, the variance - covariance matrix can be obtained via
Fit a mixture model to the data and obtain estimates
Repeat
Use the fitted model to simulate data by first draw a component distribution and then draw a data point from the component distribution
Fit the mixture model to the simulated data and obtain estimate
Order the estimate from simulation process to obtain sample quantiles.
require(boot)
speed.rg <- function(data, mle) {
simulate(data)
}
speed.fun <- function(data){
getpars(fit(data, verbose = FALSE, emcontrol = em.control(random.start = FALSE))) }
speed_boot_par <- boot(fspmix2, speed.fun, R = 1000, sim = "parametric", ran.gen = speed.rg)
boot first argument is observed data, but in this case we pass the fitted fspmix2 object.
boot second argument is a function that takes in data and return a vector of parameters
boot ran.gen argument requires a function of two argument, first argument should be the observed data and second argument consist of any other information. The function should return the simulated dataset. In this case the "data" being passed is the fitted model instead of actual observed data. So we don't need to pass any actual mle from the boot function call.
Linear Constraint Correction of Hessian
is approximated Hessian
is a matrix of the Jacobian of the linear constraints ( is the number of contraints)
Last updated