Implementing transformers papers very simplified because until you code it you don't really understand it.
- Implement GPT
- Implement BERT
- Implement T5
- Implement p-tuning
- Implement prompt tuning
- Implement LoRA
- Implement DoRA
- Implement DPO
- Implement Efficient Training of Language Models to Fill in the Middle
- Implement top-k
- Implement temperature
- Implement addition of new tokens
- Implement hallucination metrics
- Flash attention