Content

In Assignment 2, we will review the mathematics behind Word2Vec and build a neural dependency parser using PyTorch.

Math for Word2Vec

The most important equation in Word2Vec is the following probability:

And the objective function is (the colomns of are all the outsider vectors):

which is the same as the cross-entropy loss. So to update the parameters like and , we can use the following gradient equations:

And for the , we have:

For inference details, go to check N1 Representing Words or Code

Adam optimizer

  • Advantages of Momentum

    1. helps smooth out the updates by averaging the gradients over time.
    2. allows the model to maintain a consistent direction in parameter space, leading to faster convergence.
  • Advantages of adaptive learning rates

    1. helps prevent the model from getting stuck in local minima or saddle points.
    2. prevents the parameters with large gradients from dominating the optimization process, making the optimization more stable.

Dropout layer

Dropout should be applied during training because it helps prevent overfitting by randomly dropping units from the hidden layer. This helps to reduce the complexity of the model and prevent it from overfitting to the training data. However, dropout should not be applied during evaluation because it can lead to overfitting if the model is evaluated on data that it has not seen during training.

Neural dependency parsing

One hidden layer classification, designed as follows:

Results: Accracy of UAS

  • Dev: 88.52%
  • Test: 89.42%

Parsing error

Example

  • Moscow sent troops into Afghanistan.
  • Leaving the store unattended, I went outside to watch the parade.
  • I am extremely short.
  • Would you like brown rice or garlic naan?
Error nameExplanationExample
Prepositional Phrase Attachment ErrorA prepositional phrase (PP) is attached to the wrong head word.In β€œβ€¦ into Afghanistan …”, the PP is wrongly attached to troops; the correct head it should attach to is the verb sent (β€œtroops sent [into Afghanistan]”).
Verb Phrase Attachment ErrorA verb phrase (VP) is attached to the wrong head word in the clause.In β€œLeaving the store unattended, I went outside to watch the parade,” the VP should attach to went (main clause), not to watch.
Modifier Attachment ErrorA modifier (e.g., adverb/adjective) is attached to the wrong head word.In β€œI am extremely short,” the adverb extremely should modify short, not am.
Coordination Attachment ErrorIn a coordination, the second conjunct is attached to the wrong head; it should attach to the first conjunct’s head.In β€œWould you like brown rice or garlic naan?”, garlic naan should attach to the first conjunct’s head rice (i.e., [brown rice] or [garlic naan]).

Note

  1. Understanding the mathmatics behind Word2Vec and neural dependency parsing is crucial for understanding the underlying mechanisms of these models.
  2. The nueral dependency parsing model is a simple neural network but with excellent performance.
  3. When buiding nueral network, it’s a good way to understanding the whole image of the model structure.

Code

Assign 2 Completion