NLP-C1-W4: Machine Translation and Document Search

https://www.coursera.org/learn/classification-vector-spaces-in-nlp/home/week/4

Machine Translation Key Ida

Let XX be a matrix m×nm × n with each row as an English word, let YY be a matrix with each row as the corresponding French word. The translation matrix RR is the matrix that minimizes the squared Frobenius norm of the difference: XRYF2\lVert XR - Y \rVert ^2_F

RR can be obtained by iterative update via gradient:

Loss=XRYFgddRLoss=2m(XT(XRY))R=Rαg \begin{align*} & Loss = \lVert XR - Y \rVert_F \\ & g - \frac{d}{dR}Loss = \frac{2}{m}(X^T(XR-Y)) \\ & R = R - \alpha * g \\ \end{align*}

Using K-nearest Neighborhood for Translation

Once we have the matrix, we can transform an English word into a French word vector space. However, we need to use the K-nearest neighbor search to identify exactly which word to translate.

We can define nn hyperplanes with a normal vector matrix VV. Each word can be represented with wi=0w_i=0 or wi=1w_i = 1as below or above the hyperlane ii. The final has value of each word can be calculated as i=0n2iwi\sum_{i=0}^n 2^i * w_i

So, overall, we can first determine the number of hash buckets hh. This can be decided by considering how many items we want in each bucket and dividing the total number of items by the desired item count per bucket. Then, generate nn hyperplanes based on the determined number of bucket 2n=h2^n = h. For each set of hyperplanes, if two items have the same hash value, they are in the same bucket. We can repeat this process many times to improve accuracy.

Approximate K-nearest Neighborhood

Use multiple sets of hyperplanes to determine the final set of vectors that are potential candidates of nearest neighbors.

Flow:

  1. Generate multiple hash tables

  2. Given a target vector, hash it, and grab all items that are in the hash tables across all hash tables.

  3. Run the nearest neighbor search on the reduced candidate list.

Code Example

// Some code
num_dimensions = 2
num_planes = 3
random_panes_matrix = np.random.normal(size = (num_palnes, num_dimensions ))
def side_of_plane_matrix(P,v): 
    dotproduct = np.dot(P,v.T)
    sign_of_dot_product = np.sign(dotproduct)
    return sign_of_dot_product 
    
num_planes_matrix = side_of_plane_matrix(random_panes_matrix , v) 
# then use \sum_{i=0}^n 2^i * w_i to calculate the hash value 

Completed Notebook

Vector Space Model Notebook

  • Manual PCA

Machine Translation and Nearest Neighbor Search Notebook

  • LSH and approximate neighborhood

  • Manual Gradient Descent

Last updated