Transformers Attentions is all you need the problem with RNN is that I cannot remember long text transformer use encoder and decoder feeding output Good references: https://arxiv.org/pdf/1706.03762.pdf