Parallel Transformer

a fast and elegant to learn how to optimize transformer block.

Overview

transformer is a very hot NLP model in recent years, the model uses the architecture of seq2seq, the original intention of the paper aims to solve the machine translation task, in the subsequent BERT and other models, has been expanded to solve almost all NLP tasks.

optimize target

using SIMD,OpenMP,MPI to optimize Transformer attention block. This repo considers unroll, SSE, OpenMP, tile and cuda to compare all the parallel functions.

how to run the code

mkdir build
cd build
cmake ..

experiments results

using speedup to compare all the results.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
Parallel Transformer/gemm		Parallel Transformer/gemm
CmakeLists.txt		CmakeLists.txt
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Parallel Transformer

Overview

optimize target

how to run the code

experiments results

About

Releases

Packages

Languages

Tanglumy/parallel-transformer

Folders and files

Latest commit

History

Repository files navigation

Parallel Transformer

Overview

optimize target

how to run the code

experiments results

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages