NLP-C1-W4: Machine Translation and Document Search

https://www.coursera.org/learn/classification-vector-spaces-in-nlp/home/week/4

Machine Translation Key Ida

Let $X$ be a matrix $m × n$ with each row as an English word, let $Y$ be a matrix with each row as the corresponding French word. The translation matrix $R$ is the matrix that minimizes the squared Frobenius norm of the difference: $\lVert XR - Y \rVert ^2_F$

$R$ can be obtained by iterative update via gradient:

\begin{align*} & Loss = \lVert XR - Y \rVert_F \\ & g - \frac{d}{dR}Loss = \frac{2}{m}(X^T(XR-Y)) \\ & R = R - \alpha * g \\ \end{align*}

Using K-nearest Neighborhood for Translation

Once we have the matrix, we can transform an English word into a French word vector space. However, we need to use the K-nearest neighbor search to identify exactly which word to translate.

Locality Sensitive Hashing for Faster Search

We can define $n$ hyperplanes with a normal vector matrix $V$ . Each word can be represented with $w_i=0$ or $w_i = 1$ as below or above the hyperlane $i$ . The final has value of each word can be calculated as $\sum_{i=0}^n 2^i * w_i$

So, overall, we can first determine the number of hash buckets $h$ . This can be decided by considering how many items we want in each bucket and dividing the total number of items by the desired item count per bucket. Then, generate $n$ hyperplanes based on the determined number of bucket $2^n = h$ . For each set of hyperplanes, if two items have the same hash value, they are in the same bucket. We can repeat this process many times to improve accuracy.

Approximate K-nearest Neighborhood

Use multiple sets of hyperplanes to determine the final set of vectors that are potential candidates of nearest neighbors.

Flow:

Generate multiple hash tables
Given a target vector, hash it, and grab all items that are in the hash tables across all hash tables.
Run the nearest neighbor search on the reduced candidate list.

Code Example

// Some code
num_dimensions = 2
num_planes = 3
random_panes_matrix = np.random.normal(size = (num_palnes, num_dimensions ))
def side_of_plane_matrix(P,v): 
    dotproduct = np.dot(P,v.T)
    sign_of_dot_product = np.sign(dotproduct)
    return sign_of_dot_product 
    
num_planes_matrix = side_of_plane_matrix(random_panes_matrix , v) 
# then use \sum_{i=0}^n 2^i * w_i to calculate the hash value

Completed Notebook

Vector Space Model Notebook

Manual PCA

Machine Translation and Nearest Neighbor Search Notebook

LSH and approximate neighborhood
Manual Gradient Descent

PreviousNLP-C1-W3: Vector Spaces NextNLP-C2-W1: Autocorrect

Last updated 1 year ago