NLP-C2-W2: PoS Tagging and HMM
https://www.coursera.org/learn/probabilistic-models-in-nlp/home/week/2
Document Processing
Dealing with Unknown Words When Processing a Document
We can replace unknown words with different unknown tokens, such as "--unk_digit--", "--unk_punct--" and etc. Notebook practice
PoS Transition HMM
Setup:
Hidden nodes: Part of Speech, e.g.: verb, noun
Observable nodes: Actual words, e.g.: like, use
Smoothing in Calculating Transition Probabilities
Original transition probability: ( is the tag at location )
We can add smoothing to deal with cases of 0, which can cause 1) a division by 0 problems in probability calculation and 2) a probability of 0, which doesn't generalize well. So, we calculate transition probability as follows:
is the total number of tags
Smoothing in Calculating Emission Probabilities
Following the same principle, we can calculate emission probabilities as
is the total number of words
Deepdive into Hidden Markov Models
Code
Counter
// Some code
Counter('abracadabra').most_common (3)
Completed Notebook
Clear structure in pre-processing and actual modeling
Viterbi algorithm implementation
Last updated