Grading
Info
- Simple: 14.58
- Medium: 18.04
- strong: 25.20
- Boss: 29.13
Solution
| Model | HyperParameters | Train loss | Valid loss | Valid BLEU | Test BLEU | Timer |
|---|---|---|---|---|---|---|
| RNN | 1 | 4.0826 | 3.8945 | 16.58 | 17.19 | ~41 m |
| Transformer | 1-3 | 3.0306 | 3.2159 | 26.36 | 26.70 | ~6 h |
Techniques
- learning rate
def get_rate(d_model, step_num, warmup_step): lr = (d_model ** -0.5) * min(step_num ** -0.5, step_num * warmup_step ** -1.5) return lr - epoch = 30
- Trasnformer
arch_args = Namespace( encoder_embed_dim=512, encoder_ffn_embed_dim=2048, encoder_layers=4, decoder_embed_dim=512, decoder_ffn_embed_dim=2048, decoder_layers=4, share_decoder_input_output_embed=True, dropout=0.3, )
Reflection
Note
- This homework is very similar to CS224N Assignment 3 which is a translation task of Chinese to English.
- Using transformer will result in better performance but it takes a long time to train for a heavier model.
- The module
fairseqis useful but difficult to implement. I tried several times to implement it but failed. But finally I just built a new environmentpython=3.8and installed the old version offairseq, which yielded the scores above.- In the process of training, it took a long time because it stopped over 10 times due to the hardware limitation of my laptop. So I stopped to the strong line.
- I tried to use Kaggle too, which didn’t work either due to the implementation of
fairseq.