Grading

Info

  1. Simple: 14.58
  2. Medium: 18.04
  3. strong: 25.20
  4. Boss: 29.13

Solution

ModelHyperParameters
Train
loss
Valid
loss
Valid
BLEU
Test
BLEU
Timer
RNN14.08263.894516.5817.19~41 m
Transformer1-33.03063.215926.3626.70~6 h

Techniques

  1. learning rate
    def get_rate(d_model, step_num, warmup_step):
        lr = (d_model ** -0.5) * min(step_num ** -0.5, step_num * warmup_step ** -1.5)
        return lr
  2. epoch = 30
  3. Trasnformer
       arch_args = Namespace(
           encoder_embed_dim=512,
           encoder_ffn_embed_dim=2048,
           encoder_layers=4,
           decoder_embed_dim=512,
           decoder_ffn_embed_dim=2048,
           decoder_layers=4,
           share_decoder_input_output_embed=True,
           dropout=0.3,
       )

Reflection

Note

  1. This homework is very similar to CS224N Assignment 3 which is a translation task of Chinese to English.
  2. Using transformer will result in better performance but it takes a long time to train for a heavier model.
  3. The module fairseq is useful but difficult to implement. I tried several times to implement it but failed. But finally I just built a new environment python=3.8 and installed the old version of fairseq, which yielded the scores above.
  4. In the process of training, it took a long time because it stopped over 10 times due to the hardware limitation of my laptop. So I stopped to the strong line.
  5. I tried to use Kaggle too, which didn’t work either due to the implementation of fairseq.

Code

HW4 strong

Reference

Aaricis HW5