TODO Add teacher forcing to the decoder. Add predict file. Refactor train.py. Add checkpoint per step. Add Tensorboard. Add better logging. Add distributed training. Add documentation. Add load from checkpoint. add pretrained model.