tbb-matrix-multiplication

Parallel matrix multiplication using Intel TBB library. This approached achieved the fastest running time in the Parallel programming class in 2020.

Solution

Intel C++ compiler v18
Second matrix was transposed
- Enables vectorization
- Better cache efficiency
AVX 256 instructions
- Doubles the speed of vectorized functions compared to the VS compiler
- Uses 256bit registers
std::inner_product
- Standard library function that is already optimized and vectorizes pretty well
Tbb tasks achieved the best time
- Tree like hierarchy was created with tasks

Ilustration 1 - Achieved results.

Ilustration 2 - Execution time.

Ilustration 3 - Speedup compared to serial baseline.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
TBBMatrixMull		TBBMatrixMull
.gitignore		.gitignore
README.md		README.md