flash-attention-2

Here are 9 public repositories matching this topic...

DefTruth / Awesome-LLM-Inference

📖A curated list of Awesome LLM/VLM Inference Papers with codes, such as FlashAttention, PagedAttention, Parallelism, etc. 🎉🎉

sora llm llms vllm llm-inference awesome-llm flash-attention flash-attention-2 tensorrt-llm paged-attention deepseek open-sora flash-attention-3

Updated Dec 22, 2024

arihanv / Shush

Star

Shush is an app that deploys a WhisperV3 model with Flash Attention v2 on Modal and makes requests to it via a NextJS app

machine-learning modal transcription whisper huggingface-transformers shadcn-ui flash-attention-2

Updated Jun 7, 2024
TypeScript

alexzhang13 / flashattention2-custom-mask

Star

Triton implementation of FlashAttention2 that adds Custom Masks.

deep-learning triton attention cuda-kernels attention-mechanism triton-lang flash-attention flash-attention-2

Updated Aug 14, 2024
Python

Bruce-Lee-LY / flash_attention_inference

Star

Performance of the C++ interface of flash attention and flash attention v2 in large language model (LLM) inference scenarios.

gpu cuda inference nvidia cutlass mha multi-head-attention llm tensor-core large-language-model flash-attention flash-attention-2

Updated Sep 7, 2024
C++

BBC-Esq / WhisperS2T-transcriber

Star

Uses the powerful WhisperS2T and Ctranslate2 libraries to batch transcribe multiple files

audio-recorder audio-recording transcription audio-transcribing transcriber audio-transcription transcr ctranslate2 flash-attention-2 whispers2t

Updated Sep 17, 2024
Python

Flash Attention Implementation with Multiple Backend Support and Sharding This module provides a flexible implementation of Flash Attention with support for different backends (GPU, TPU, CPU) and platforms (Triton, Pallas, JAX).

pallas jax flash-attention flash-attention-2

Updated Dec 3, 2024
Python

graphcore-research / flash-attention-ipu

Star

Poplar implementation of FlashAttention for IPU

deep-learning transformers pytorch ipu graphcore poplar flash-attention flash-attention-2

Updated Mar 12, 2024
C++

gietema / attention

Star

Toy Flash Attention implementation in torch

torch flash-attention flash-attention-2 flash-attention-3

Updated Sep 22, 2024
Python

lalitdotdev / transcribeX

Star

Transcribe audio in minutes with OpenAI's WhisperV3 and Flash Attention v2 + Transformers without relying on third-party providers and APIs. Host it yourself or try it out.

python modal transformers transcription wavesurfer-js nvidia-cuda bun nvidia-gpu virtual-environment fastapi huggingface-transformers flash-attention-2 next14 whisper- whisperv3

Updated Jun 18, 2024
TypeScript

Improve this page

Add a description, image, and links to the flash-attention-2 topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the flash-attention-2 topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

flash-attention-2

Here are 9 public repositories matching this topic...

DefTruth / Awesome-LLM-Inference

arihanv / Shush

alexzhang13 / flashattention2-custom-mask

Bruce-Lee-LY / flash_attention_inference

BBC-Esq / WhisperS2T-transcriber

erfanzar / jax-flash-attn2

graphcore-research / flash-attention-ipu

gietema / attention

lalitdotdev / transcribeX

Improve this page

Add this topic to your repo