sgemm
Here are 12 public repositories matching this topic...
maxas Scott Grey's maxas assembler sgemm explaining the (for me) missing parts https://github.com/NervanaSystems/maxas
-
Updated
Dec 22, 2018 - CSS
The repository targets the OpenCL gemm function performance optimization. It compares several libraries clBLAS, clBLAST, MIOpenGemm, Intel MKL(CPU) and cuBLAS(CUDA) on different matrix sizes/vendor's hardwares/OS. Out-of-the-box easy as MSVC, MinGW, Linux(CentOS) x86_64 binary provided. 在不同矩阵大小/硬件/操作系统下比较几个BLAS库的sgemm函数性能,提供binary,开盒即用。
-
Updated
Mar 28, 2019 - C
A benchmark framework for POWER and x86_64
-
Updated
Jun 5, 2020 - Mathematica
This is a series of GPU optimization topics. Here we will introduce how to optimize the CUDA kernel in detail. I will introduce several basic kernel optimizations, including: elementwise, reduce, sgemv, sgemm, etc. The performance of these kernels is basically at or near the theoretical limit.
-
Updated
Jul 29, 2023 - Cuda
Fast, Multi-threaded Matrix Multiplication in C
-
Updated
Oct 20, 2024 - C
General Matrix Multiplication using NVIDIA Tensor Cores
-
Updated
Oct 25, 2024 - Cuda
Improve this page
Add a description, image, and links to the sgemm topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with the sgemm topic, visit your repo's landing page and select "manage topics."