Hidden Markov Models

Materials are collected from

Objective 2: Determining hidden states given sequence and model

Obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states given a sequence of observed states

Viterbi Algorithm

Components:

  • Transition matrix AA and emission matrix BB of Hidden Markov Model

  • Auxiliary matrix Cn×kC_{n \times k} which tracks the intermediate optional probability. nn is the number of hidden states and kk is the number of words in observed states (e.g.: a text sequence)

  • Auxiliary matrix Dn×kD_{n \times k}, which tracks the indices of visited states

Initialization:

  1. Fill the first column of CC( the probability from start sate to the first word with each hidden state) with: ci,1=πibi,cindex(w1)c_{i,1} = \pi_i * b_{i, cindex(w_1)}

    • πi\pi_i: The transition probability from start state π\pi to hidden state ii

    • bi,cindex(w1)b_{i, cindex(w_1)}: The emission probability of word 1 (which has column index cindex(w1)cindex(w_1) in emission matrix) at hidden state ii.

  2. Fill the first column of DD with 0. This is becase there is no preceding part of speech tags that we traversed from.

Forward Pass

Populate the rest of CC and DD matrix column by column with the following formula:

ci,j=maxkck,j1ak,ibi,cindex(wj)di.j=arg maxkck,j1ak,ibi,cindex(wj) \begin{align*} & c_{i,j} = \max_k c_{k, j-1} * a_{k,i} * b_{i, cindex(w_j)}\\ & d_{i.j} = \argmax_k c_{k, j-1} * a_{k,i} * b_{i, cindex(w_j)}\\ \end{align*}
  • ak,ia_{k,i} is the transition probability from state kk to state ii

Logic: Look at ci,jc_{i,j}, we want to find the optimal probability for the word sequence up to word jj where the word jj speech tag is tit_i. This is by iteratively looking over all possible previous ck,j1c_{k, j-1}, i.e.: the tag optiosn ending at word j1j-1. For each optimal probability ck,j1c_{k, j-1},we then multiply the probability of transition from tkt_k to tit_i, hence the ak,ia_{k,i}, and the emission probability of word jj at tit_i, hence bi,cindex(wj)b_{i, cindex(w_j)}. ForDDmatrix, we want to track the argmax, which is which previous hidden states did we decide to travel from.

Backward Pass

  1. Get s=arg maxici,Ks = \argmax_i c_{i, K}. The probability at index ss is the posterior probability estimate of the sequence.

  2. We go to column KK of DD, and find the index at row ss, i.e.: tagK1=ds,Ktag_{K-1} = d_{s, K}

  3. Go to the left next column (K1K-1), and find the index at row tagK1tag_{K-1}, i.e.: tagK2=dtagK1,k1tag_{K-2} = d_{tag_{K-1}, k-1}

Considerations: Log Probability

Because we are multiplying many small numbers, we should again consider using log probability, hence

log(Ci,j)=maxklog(Ck,j1)+log(ak,i)+log(bi,cindex(wj)) \begin{align*} \log(C_{i,j}) = \max_k \log(C_{k, j-1}) + \log(a_{k,i}) + \log(b_{i, cindex(w_j)}) \end{align*}

Last updated