Hidden Markov Models
Materials are collected from
Objective 2: Determining hidden states given sequence and model
Obtaining the maximum a posteriori probability estimate of the most likely sequence of hidden states given a sequence of observed states
Viterbi Algorithm
Components:
Transition matrix A and emission matrix B of Hidden Markov Model
Auxiliary matrix Cn×k which tracks the intermediate optional probability. n is the number of hidden states and k is the number of words in observed states (e.g.: a text sequence)
Auxiliary matrix Dn×k, which tracks the indices of visited states
Initialization:
Fill the first column of C( the probability from start sate to the first word with each hidden state) with: ci,1=πi∗bi,cindex(w1)
πi: The transition probability from start state π to hidden state i
bi,cindex(w1): The emission probability of word 1 (which has column index cindex(w1) in emission matrix) at hidden state i.
Fill the first column of D with 0. This is becase there is no preceding part of speech tags that we traversed from.
Forward Pass
Populate the rest of C and D matrix column by column with the following formula:
ak,i is the transition probability from state k to state i
Logic: Look at ci,j, we want to find the optimal probability for the word sequence up to word j where the word j speech tag is ti. This is by iteratively looking over all possible previous ck,j−1, i.e.: the tag optiosn ending at word j−1. For each optimal probability ck,j−1,we then multiply the probability of transition from tk to ti, hence the ak,i, and the emission probability of word j at ti, hence bi,cindex(wj). ForDmatrix, we want to track the argmax, which is which previous hidden states did we decide to travel from.
Backward Pass
Get s=argmaxici,K. The probability at index s is the posterior probability estimate of the sequence.
We go to column K of D, and find the index at row s, i.e.: tagK−1=ds,K
Go to the left next column (K−1), and find the index at row tagK−1, i.e.: tagK−2=dtagK−1,k−1
Considerations: Log Probability
Because we are multiplying many small numbers, we should again consider using log probability, hence
Last updated