Parallel matrix multiplication using Intel TBB library. This approached achieved the fastest running time in the Parallel programming class in 2020.
- Intel C++ compiler v18
- Second matrix was transposed
- Enables vectorization
- Better cache efficiency
- AVX 256 instructions
- Doubles the speed of vectorized functions compared to the VS compiler
- Uses 256bit registers
- std::inner_product
- Standard library function that is already optimized and vectorizes pretty well
- Tbb tasks achieved the best time
- Tree like hierarchy was created with tasks
- Windows 10
- Intel i7 9750h processor (6 cores, 12 logical) @2.56GHz (4.5GHz boost)
- Nvidia GTX1660TI
- 16GB ddr4 ram (dual channel)
- 512 GB M2-SSD