Let X be a matrix m×n with each row as an English word, let Y be a matrix with each row as the corresponding French word. The translation matrix R is the matrix that minimizes the squared Frobenius norm of the difference: ∥XR−Y∥F2
R can be obtained by iterative update via gradient:
Loss=∥XR−Y∥Fg−dRdLoss=m2(XT(XR−Y))R=R−α∗g
Using K-nearest Neighborhood for Translation
Once we have the matrix, we can transform an English word into a French word vector space. However, we need to use the K-nearest neighbor search to identify exactly which word to translate.
Locality Sensitive Hashing for Faster Search
We can define n hyperplanes with a normal vector matrix V. Each word can be represented with wi=0 or wi=1as below or above the hyperlane i. The final has value of each word can be calculated as ∑i=0n2i∗wi
So, overall, we can first determine the number of hash buckets h. This can be decided by considering how many items we want in each bucket and dividing the total number of items by the desired item count per bucket. Then, generate n hyperplanes based on the determined number of bucket 2n=h. For each set of hyperplanes, if two items have the same hash value, they are in the same bucket. We can repeat this process many times to improve accuracy.
Approximate K-nearest Neighborhood
Use multiple sets of hyperplanes to determine the final set of vectors that are potential candidates of nearest neighbors.
Flow:
Generate multiple hash tables
Given a target vector, hash it, and grab all items that are in the hash tables across all hash tables.
Run the nearest neighbor search on the reduced candidate list.