Deep Learning Notes chevron-right Natural Language Processing NLP-C2-W2: PoS Tagging and HMM https://www.coursera.org/learn/probabilistic-models-in-nlp/home/week/2
Document Processing
Dealing with Unknown Words When Processing a Document
We can replace unknown words with different unknown tokens, such as "--unk_digit--", "--unk_punct--" and etc. Notebook practice arrow-up-right
PoS Transition HMM
Hidden nodes: Part of Speech, e.g.: verb, noun
Observable nodes: Actual words, e.g.: like, use
Smoothing in Calculating Transition Probabilities
Original transition probability: (t i t_i t i is the tag at location i i i )
P ( t i ∣ t i − 1 ) = C o u n t ( t i − 1 , t i ) ∑ j = 1 N C o u n t ( t i − 1 , t j ) \begin{align*}
P(t_i|t_{i-1}) = \frac{Count(t_{i-1}, t_i)}{\sum_{j=1}^N Count(t_{i-1}, t_j)}
\end{align*} P ( t i ∣ t i − 1 ) = ∑ j = 1 N C o u n t ( t i − 1 , t j ) C o u n t ( t i − 1 , t i ) We can add smoothing to deal with cases of 0, which can cause 1) a division by 0 problems in probability calculation and 2) a probability of 0, which doesn't generalize well. So, we calculate transition probability as follows:
P ( t i ∣ t i − 1 ) = C o u n t ( t i − 1 , t i ) + ϵ ∑ j = 1 N C o u n t ( t i − 1 , t j ) + N ∗ ϵ \begin{align*}
P(t_i|t_{i-1}) = \frac{Count(t_{i-1}, t_i) + \epsilon}{\sum_{j=1}^N Count(t_{i-1}, t_j) + N * \epsilon}
\end{align*} P ( t i ∣ t i − 1 ) = ∑ j = 1 N C o u n t ( t i − 1 , t j ) + N ∗ ϵ C o u n t ( t i − 1 , t i ) + ϵ N N N is the total number of tags
Smoothing in Calculating Emission Probabilities
Following the same principle, we can calculate emission probabilities as
p ( w i ∣ t i ) = C o u n t ( t i , w i ) + ϵ ∑ j = 1 V C o u n t ( t i , w j ) + N ∗ ϵ \begin{align*}
p(w_i|t_i) = \frac{Count(t_i, w_i) + \epsilon}{\sum_{j=1}^V Count(t_i, w_j) + N * \epsilon}
\end{align*} p ( w i ∣ t i ) = ∑ j = 1 V C o u n t ( t i , w j ) + N ∗ ϵ C o u n t ( t i , w i ) + ϵ N N N is the total number of words
Deepdive into Hidden Markov Models
Completed Notebook
Part of Speech Taggingarrow-up-right
Clear structure in pre-processing and actual modeling
Viterbi algorithm implementation