Hooked CUDA-related dynamic libraries by using automated code generation tools.
-
Updated
Dec 12, 2023 - C
Hooked CUDA-related dynamic libraries by using automated code generation tools.
Face Recognition with RetinaFace and ArcFace.
Multiple GEMM operators are constructed with cutlass to support LLM inference.
[WIP] PyTorch bindings for cublasLt with an example of quantized i8f16 MLP
Add a description, image, and links to the cublaslt topic page so that developers can more easily learn about it.
To associate your repository with the cublaslt topic, visit your repo's landing page and select "manage topics."