You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
(Unofficial) building Hugging Face SmolLM-blazingly fast and remarkably powerful small language model with PyTorch implementation of grouped query attention (GQA)
Some of the techniques used in the LLM pretraining design include:
Embedding Sharing
Grouped Query Design
SwiGLU Activations for the Multi Perceptron Layer
Intermidate blockwise weight sharing
About
(Unofficial) building Hugging Face SmolLM-blazingly fast and remarkably powerful small language model with PyTorch implementation of grouped query attention (GQA)