Efficient Triton Kernels for LLM Training
-
Updated
Sep 17, 2024 - Python
Efficient Triton Kernels for LLM Training
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
🎉 CUDA Learn Notes with PyTorch: fp32、fp16/bf16、fp8/int8、flash_attn、sgemm、sgemv、warp/block reduce、dot prod、elementwise、softmax、layernorm、rmsnorm、hist etc.
A service for autodiscovery and configuration of applications running in containers
Playing with the Tigress software protection. Break some of its protections and solve their reverse engineering challenges. Automatic deobfuscation using symbolic execution, taint analysis and LLVM.
A subset of PyTorch's neural network modules, written in Python using OpenAI's Triton.
Automatic ROPChain Generation
SymGDB - symbolic execution plugin for gdb
A performance library for machine learning applications.
(WIP)The deployment framework aims to provide a simple, lightweight, fast integrated, pipelined deployment framework for algorithm service that ensures reliability, high concurrency and scalability of services.
ClearML - Model-Serving Orchestration and Repository Solution
NVIDIA-accelerated, deep learned model support for image space object detection
NVIDIA-accelerated DNN model inference ROS 2 packages using NVIDIA Triton/TensorRT for both Jetson and x86_64 with CUDA-capable GPU
Deploy DL/ ML inference pipelines with minimal extra code.
Static analysis & deobfuscation framework for x86/x64
Add a description, image, and links to the triton topic page so that developers can more easily learn about it.
To associate your repository with the triton topic, visit your repo's landing page and select "manage topics."