Filter candidates (only restrict to correctly spelled word)
Calculate wor probability p(w)=Vcount(w) where Vis the size of the corpus. A more advanced version would be to track two words co-occurring and use the word before a misspelled word to identify the correct autocorrect.
Find Minimum Edit Distance with Dynamic Programming
Given word x (source) and y (target), construct a matrix where the columns are alphabets of word x and rows are alphabets of y. Both should include a blank / starting line character. Cell i,j means the minimum edit distance that goes from wx[:i]to wy[:j]. Notice here wx[:j] refers to all the characters include i. Therefore, the minimum edit distance problem can be decomposed into
We can look at example below transforming from "play" to "stay" with insert and delete cost 1 and replace cost of 2.
#play#01234s12345t23456a34545y45654
We can pick cell (a,s) as an example, which means we want to know the edit distance from "pla" to "s":
If we are at cell (l, s), we know the edit distance from "pl" to "s". So we can do a deletion to go from "pla" to "pl", then use the knowledge to go from "pl" to "s"/
If we are at cell (a, #), we know the edit distance from "pla" to "#". So we can do an insertion of "s" after going from "pla" to "#"
If we are at cell (l, #), we know the edit distance from "pl" to "#". Since the target is "pla" to "s", and "a" is not same as "s", we have to do a replacement. Imagine if we are going from "pls" to "s", then it would be the same cost as going from "pl" to "#".
Backtracing
As we construct the table, we need to keep a pointer in each cell on how I get to the cell.