(Unofficial) building Hugging Face SmolLM-blazingly fast and remarkably powerful small language model with PyTorch implementation of grouped query attention (GQA)
transformer
attention
smol
huggingface
ml-efficiency
llm
grouped-query-attention
smol-lm
huggingface-smol-lm
-
Updated
Oct 11, 2024 - Python