This is an implementation of backward propagation of Transformer by Pytorch w/Python3.8 based on the paper: Attention is all you need.
torch==2.2.0
numpy==1.22.3
.
|-- Back_Propagation
| |-- Encoder.py
| |-- FFN.py
| |-- LayerNorm.py
| |-- MultiHead.py
| |-- __pycache__
| | |-- FFN.cpython-38.pyc
| | |-- LayerNorm.cpython-38.pyc
| | `-- MultiHead.cpython-38.pyc
| |-- basic_layer.py
| `-- requirements.txt
|-- LICENSE
|-- README.md
You can test the implementation of back propagation by running the commented code at the bottom of each python file.
- FFN Layer
- Linear Layer
- Multi-head Attention Layer
- Encoder Layer
- Embedding Layer
- Decoder Layer
- Encoder
- Decoder
- Transformer
Notice: Without considering dropout