Skip to content

jacob975/my_language_model

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

my_language_model

Overview

There are my sample code of natural language generative models, and their objective is translation, usually from chinese to english. This is an interesting topic since the debate of MLE and GAN on NLP in Caccia+18.

Model and algorithm

I use Transformer in Vaswani+17 as my model, and the objective is maximize likelihood estimation(MLE).

Preprocessing

Before the training start, all sentences will be encoded as sequences of one-hot encoding. The length is limited by a certain number.

Pretraining

I use self-supervised learning with objective of MLE as pretraining. Self-supervised learning can make a use of monolingual sentences, which is usually cheaper than biligual sentences. Therefore, we can collect cheaper data for training.

Experiments

I also implement other algorithms into my model as well as note on their name.

  • In the notebook named after 'LSTM', I replace Transformer by Long Short-Term Memory (LSTM) RNN.
  • In the notebook named after 'seqGAN', I use both MLE and Sequential Generative Adversarial Networks(seqGAN in Yu+16).
  • In the notebook named after 'TayPO', I use the concept of Taylor series to generalize the Proximal Policy Optimization and then implement it into my seqGAN (Tang+20).
  • In the notebook named after 'radam', I replace the optimizor Adam by RectifiedAdam in Liu+19
  • In the notebook named after 'slowMLE', I construct a model with two-stage learning rate, 10^-3 and 10^-5.

About

Where to save my language models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published