Keeping Track of Affordable LLMs

This repository is dedicated to organizing affordable but powerful language models (LLMs).
The repository providing valuable insights into the latest models, including number of parameters, fine-tuning datasets and techniques, and hardware specifications.
With this repository, you can quickly and easily access all the vital information you need for your affordable LLM needs.

Base Models

EleutherAI: GPT-J, GPT-NEO, GPT-NEOX, Pythia / Dolly
huggingface BigScience: BLOOM / BELLE, Phoenix
Meta: OPT, Galactica, LLaMA / Phoenix, Alpaca, Vicuna
LAION AI: Open-Assistant / HuggingChat
Tsinghua: GLM / ChatGLM-6B
Cerebras: Cerebras-GPT
BlinkDL: RWKV
Microsoft: DeepSpeedChat
ColossalAI: ColossalChat
Google: BERT, T5, Flan, Switch Transformers, LaMDA, FLAN-T5, PaLM, PaLM-E
DeepMind: Chinchilla, Gopher, Sparrow
Anthropic: Claude
OpenAI: GPT-1, GPT-2, GPT-3, WebGPT, InstructGPT, ChatGPT, GPT-4

Model Spec

project	base model	data	finetune	hardware/Cost
Stanford/Alpaca	LLaMA-7B	52K instruction-followling dataset, generate in self-instruct style using text-davinci-003	SFT	3 hours on 8 80GB A100s, `$500(data) + $100(train)`
NLPCloud/instruct-gpt-j	GPT-J-6B	52K Alpaca	SFT	fp16 model deploy well on 16GB Tesla T4
LianjiaTech/BELLE	BLOOMZ-7B1-mt	2M chinese data generated in a Alpaca way	SFT	8-bit GPTQ quantization using 12GB GPU
LianjiaTech/BELLE	LLaMA-7B	same	SFT	4-bit ggml quantization work well on M1 chip Mac
Alpaca-LoRA	LLaMA-7B	52K Alpaca; update to MSFT LLaMA-GPT4 dataset	SFT with LoRA	hours on a single RTX 4090(24GB)
Databricks/Dolly-v1-6B	GPT-J-6B	52K Alpaca	SFT
Databricks/Dolly-v2-12B	Pythia-12b	databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper	SFT	about 3.5 hours on 8 V100s with fp16 to complete 1 epoch
GPT4All	LLaMA-7B	~800k GPT-3.5-Turbo Generations	SFT with LoRA
HIT&HFL/Chinese-LLaMA-Alpaca	LLaMA-7B/13B	ahout 2M chinese and english dataset	add 20K chinese sentencepiece tokens to vocab to improve chinese decoding effciency; using DeepSpeed Zero-2	pretrain on 20GB general chinese corpus on 16 A100s; SFT with LoRA on 16 A100s
HIT&HFL/Chinese-LLaMA-Plus-7B	LLaMA-7B	re-pretrain LLaMA on larger(120G) general corpus, fine-tune with 4M instruction dataset	SFT with LoRA(bigger rank)
THUDM/ChatGLM-6B
LLaMA-Adaptor	LLaMA-7B	52K Alpaca	SFT with LLaMA-Adaptor	reduce 3 hours to 1 hour, 1.2M instead of 7B
FastChat/Vicuna	LLaMA-7B/13B	70K user-shared conversations gathered from ShareGPT.com	SFT, 40x larger dataset and 4x sequence length	4/8 A100s, $140/300 for training, Impressing GPT-4 with ~90% ChatGPT Quality
BAIR/Koala	LLaMA-13B	Around 60K dialogues shared by users on ShareGPT; Human ChatGPT Comparison Corpus (HC3), Open Source Data...	SFT with JAX/Flax	2 epochs in 6 hours using 8 A100s, beat ChatGPT on 180 real user queries
Baize	LLaMA-7B/13B/30B	100k dialogs generated by letting ChatGPT chat with itself; QA and healthcare dataset	SFT with LoRA	run on A100(80GB)s
Firefly	bloom-1b4/2b6-zh	1.1M instruction dataset build from 23 chinese NLP tasks, BELLE-0.5M-cn	reduce vocab from 25w to 4.6w, SFT
Arxiv Chat			build on ChatGPT(QA), LangChain(main logic) and h2oai(UI)
huggingface/StackLLaMA	LLaMA-7B	Stack Exchange dataset(10M<N<100M)	SFT + RLHF	(2+8)*7B=70GB, 80GB A100 works fine, LoRA/PEFT makes 50-60B model works on a single A100 possible
MSFT/LLaMA-GPT4	LLaMA-7B	52K Alpaca prompt input using GPT-4	SFT, RM
MSFT/DeepSpeed Chat			support SFT, RM, RLHF	Efficiency and Affordability
ColossalAI/ColossalChat			support SFT, RM, RLHF	quick preview
Phoenix	LLaMA-7B/13B	vast collection of popular multilingual open source dataset	SFT
fudan/MOSS-003	MOSS-16B	~1.1M text-davinci-003 generated self-instruct dataset, include ~300k plugins dataset as text-to-image/equations/.etc	SFT	fp16 finetune on 2 A100s or 4/8-bit finetune on single 3090
replit/replit-code-v1-3b	2.7B	entirely code, 525B tokens		10 days, benchmark better CodeX

Fine-tune Stages

SFT: Raw, LoRA, PEFT; Chinese Vocab Fixing; Instruction Dataset generated using ChatGPT/GPT4, Human labeled dataset like databricks-dolly-15k;
RM: GPT-4 assign scores using its judging quality ability; Open Source Datasets;
RLHF: DeepSpeedChat/ColossalChat;

Typology of efficient LLM Training

Data & Model Parallel
- Data Parallel
- Tensor Parallel
- Pipeline Paralle
- Zero Redundancy Optimizer(ZeRO) (DeepSpeed, often work with CPU offloading)
- Sharded DDP(FSDP)
- Mixture-of-Experts (MoE)
Param Efficient
- LoRA
- PEFT
- Checkpointing
- Offloading(ZeRO)
- Memory Efficient Optimizers
- 16-bit mix precision
- 8-bit: bitsandbytes / triton
- 4-bit: gptq / ggml

Instruction Dataset

LLM evaluation

TruthfulQA: the benchmark comprises 817 questions that span 38 categories, including health, law, finance and politics.
Chinese-LLaMA-Alpaca: the chinese benchmark contains 10 tasks with 20 example for each
https://github.com/EleutherAI/lm-evaluation-harness: blog
MMLU: English LLM evalution
https://github.com/Felixgithub2017/MMCU: zero/few-shot evaluation on 15 chinese tasks, contains med, law, psy, edu.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chart.md

chart.md

Keeping Track of Affordable LLMs

Base Models

Model Spec

Fine-tune Stages

Typology of efficient LLM Training

Instruction Dataset

LLM evaluation

Files

chart.md

Latest commit

History

chart.md

File metadata and controls

Keeping Track of Affordable LLMs

Base Models

Model Spec

Fine-tune Stages

Typology of efficient LLM Training

Instruction Dataset

LLM evaluation