Status: Read
Author: Ashish Vaswani, Illia Polosukhin, Noam Shazeer, Łukasz Kaiser
Topic: Attention, Text , Transformers
Category: Architecture
Conference: NIPS
Year: 2017
Summary: Talks about Transformer architecture which brings SOTA performance for different tasks in NLP
The authors propose a new architecture for NLP tasks which alleviates the sequential processing issue of RNNs. They propose the use of only Attention network to combine and find relations between the words.
- Use of only attention layers and feed forward neural networks
- Introduction to positional embeddings to induce sense of sequence
- Multi headed attention to attend the embeddings in different feature space
- Transformers are powerful models giving SOTA performance in various tasks in machine learning
- They have now become a default go to option whenever a problem needs to be solved
- BERT, T5, GPT2, GPT3, etc. are all transformer based models
- Hugging face is a good open source github repo which gives implementations of different transformer models and tasks