GitHub - MyDarapy/SmolLM-experiments-with-grouped-query-attention: (Unofficial) building Hugging Face SmolLM-blazingly fast and remarkably powerful small language model with PyTorch implementation of grouped query attention (GQA)

Some of the techniques used in the LLM pretraining design include:

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
.vscode		.vscode
Dataloader.py		Dataloader.py
Embeddings.py		Embeddings.py
README.md		README.md
SmolLM.py		SmolLM.py
SwiGLU_FFN.py		SwiGLU_FFN.py
config.py		config.py
grouped_query_attention.py		grouped_query_attention.py
learning_rate_scheduler.py		learning_rate_scheduler.py
playground.py		playground.py
preprocessing.py		preprocessing.py
shakespare.txt		shakespare.txt
the_playground.py		the_playground.py
train_tokenizer.py		train_tokenizer.py
utils.py		utils.py

Provide feedback