Given observations z1:T, and assume that we can divide covariates into the ones for prior (zt(pr)) and ones for model (zt(obs)) . The joint log-likelihood is
logf(Y1:T,S1:T∣θ,z1:T)=t=1∑TlogP(St∣θpr,zt(pr))+t=1∑Tlogf(Yt∣St,θobs,zt(obs)) Generally, we can model the effects of covariates on initial probabilities as multinomial logistic regression.
logP(St=N∣θpr,zt(pr))P(St=i∣θpr,zt(pr))=zt(pr)βiP(St=i∣θpr,zt(pr))=∑j=1Nexp(zt(pr)βj)exp(zt(pr)βi) Let the parameters for the baseline state (e.g.: State N) to be 0
This means when applying the EM algorithm, we need to find values θpr to maximize:
t=1∑Ti=1∑Nγt(i)logP(St=i∣θpr,zt(pr))