Grading

Info

Simple: 14.58

Medium: 18.04

strong: 25.20

Boss: 29.13

Solution

Model	HyperParameters	Train loss	Valid loss	Valid BLEU	Test BLEU	Timer
RNN	1	4.0826	3.8945	16.58	17.19	~41 m
Transformer	1-3	3.0306	3.2159	26.36	26.70	~6 h

Techniques

learning rate

def get_rate(d_model, step_num, warmup_step):
    lr = (d_model ** -0.5) * min(step_num ** -0.5, step_num * warmup_step ** -1.5)
    return lr

epoch = 30

Trasnformer

   arch_args = Namespace(
       encoder_embed_dim=512,
       encoder_ffn_embed_dim=2048,
       encoder_layers=4,
       decoder_embed_dim=512,
       decoder_ffn_embed_dim=2048,
       decoder_layers=4,
       share_decoder_input_output_embed=True,
       dropout=0.3,
   )

Reflection

Note

This homework is very similar to CS224N Assignment 3 which is a translation task of Chinese to English.

Using transformer will result in better performance but it takes a long time to train for a heavier model.

The module fairseq is useful but difficult to implement. I tried several times to implement it but failed. But finally I just built a new environment python=3.8 and installed the old version of fairseq, which yielded the scores above.

In the process of training, it took a long time because it stopped over 10 times due to the hardware limitation of my laptop. So I stopped to the strong line.

I tried to use Kaggle too, which didn’t work either due to the implementation of fairseq.

Code

HW4 strong

Reference

Aaricis HW5

🌲vsk_dl_notes

Explorer

HW5 EN-ZH Translation

Grading

Solution

Techniques

Reflection

Code

Reference

Graph View

Table of Contents