a fast and elegant to learn how to optimize transformer block.
transformer is a very hot NLP model in recent years, the model uses the architecture of seq2seq, the original intention of the paper aims to solve the machine translation task, in the subsequent BERT and other models, has been expanded to solve almost all NLP tasks.
using SIMD,OpenMP,MPI to optimize Transformer attention block. This repo considers unroll, SSE, OpenMP, tile and cuda to compare all the parallel functions.
mkdir build
cd build
cmake ..
using speedup to compare all the results.