The Role of Small Models

This work is ongoing, and we welcome any comments or suggestions.

Please feel free to reach out if you find we have overlooked any relevant papers.

What is the Role of Small Models in the LLM Era: A Survey

Lihu Chen¹ Gaël Varoquaux²

¹ Imperial College London, UK ² Soda, Inria Saclay, France

Content List

Collaboration
- SMs Enhance LLMs
- LLMs Enhance SMs
  - Knowledge Distillation
    - Black-box Distillation
    - White-box distillation
  - Data Synthesis
    - Data Augmentation
    - Training Data Generation
Competition

Collaboration

SMs Enhance LLMs

Data Curation

Curating pre-training data

Title	Topic	Venue	Code
Data selection for language models via importance resampling	Data Selection
When Less is More: Investigating Data Pruning for Pretraining LLMs at Scale	Data Selection
CCNet: Extracting High Quality Monolingual Datasets from Web Crawl Data	Data Selection
QuRating: Selecting High-Quality Data for Training Language Models	Data Selection
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining	Data Reweighting

Curating Instruction-tuning Data

Title	Topic	Venue	Code
MoDS: Model-oriented Data Selection for Instruction Tuning	Data Selection
LESS: Selecting Influential Data for Targeted Instruction Tuning	Data Selection
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning	Data Selection

Weak-to-Strong Paradigm

Using weaker (smaller) models to align stronger (larger) models

Title	Topic	Venue	Code
Weak-to-Strong Generalization: Eliciting Strong Capabilities With Weak Supervision	Weak-to-Strong
Weak-to-Strong Search: Align Large Language Models via Searching over Small Language Models	Weak-to-Strong
Co-Supervised Learning: Improving Weak-to-Strong Generalization with Hierarchical Mixture of Experts	Weak-to-Strong
Improving Weak-to-Strong Generalization with Reliability-Aware Alignment	Weak-to-Strong
Aligner: Efficient Alignment by Learning to Correct	Weak-to-Strong
Vision Superalignment: Weak-to-Strong Generalization for Vision Foundation Models	Weak-to-Strong
Theoretical Analysis of Weak-to-Strong Generalization	Weak-to-Strong

Efficient Inference

Ensembling different-size models to reduce inference costs

Title	Topic	Venue	Code
Efficient Edge Inference by Selective Query	Model Cascading
FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance	Model Cascading
Data Shunt: Collaboration of Small and Large Models for Lower Costs and Better Performance	Model Cascading
AutoMix: Automatically Mixing Language Models	Model Cascading
FrugalML: How to use ML Prediction APIs more accurately and cheaply	Model Cascading
Model Cascading: Towards Jointly Improving Efficiency and Accuracy of NLP Systems	Model Cascading
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models	Model Routing
Tryage: Real-time, intelligent Routing of User Prompts to Large Language Models	Model Routing
OrchestraLLM: Efficient Orchestration of Language Models for Dialogue State Tracking	Model Routing
RouteLLM: Learning to Route LLMs with Preference Data	Model Routing
Fly-Swat or Cannon? Cost-Effective Language Model Choice via Meta-Modeling	Model Routing
Hybrid LLM: Cost-Efficient and Quality-Aware Query Routing	Model Routing
LLM-BLENDER: Ensembling Large Language Models with Pairwise Ranking and Generative Fusion	Model Routing
RouterBench: A Benchmark for Multi-LLM Routing System	Model Routing
Large Language Model Routing with Benchmark Datasets	Model Routing

Speculative Decoding

Title	Topic	Venue	Code
Fast Inference from Transformers via Speculative Decoding	Speculative Decoding
Unlocking Efficiency in Large Language Model Inference: A Comprehensive Survey of Speculative Decoding	Speculative Decoding
Accelerating Large Language Model Decoding with Speculative Sampling	Speculative Decoding

Evaluating LLMs

Using SMs to evaluate LLM's generations

Title	Topic	Venue	Code
BERTScore: Evaluating Text Generation with BERT	General Evaluation
BARTScore: Evaluating Generated Text as Text Generation	General Evaluation
Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation	Uncertainty
Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models	Uncertainty
ProxyLM: Predicting Language Model Performance on Multilingual Tasks via Proxy Models	Performance Prediction

Domain Adaptation

Using domain-specific SMs to adjust token probability of LLMs at decoding time

Title	Topic	Venue	Code
CombLM: Adapting Black-Box Language Models through Small Fine-Tuned Models	White-box Domain Adaptation
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning	White-box Domain Adaptation
Tuning Language Models by Proxy	White-box Domain Adaptation

Using domain-specific SMs to generate knowledge for LLMs at reasoning time

Title	Topic	Venue	Code
Knowledge Card: Filling LLMs' Knowledge Gaps with Plug-in Specialized Language Models	Black-box Domain Adaptation
BLADE: Enhancing Black-box Large Language Models with Small Domain-Specific Models	Black-box Domain Adaptation

Retrieval Augmented Generation

Using SMs to retrieve knowledge for enhancing generations:

Title	Topic	Venue	Code
Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks	Documents
KnowledGPT: Enhancing Large Language Models with Retrieval and Storage Access on Knowledge Bases	Knowledge Bases
End-to-End Table Question Answering via Retrieval-Augmented Generation	Tables
DocPrompting: Generating Code by Retrieving the Docs	Codes
Toolformer: Language Models Can Teach Themselves to Use Tools	Tools
Retrieval-Augmented Multimodal Language Modeling	Images

Prompt-based Reasoning

Using SMs to augment prompts for LLMs

Title	Topic	Venue	Code
UPRISE: Universal Prompt Retrieval for Improving Zero-Shot Evaluation	Retrieving Prompts
Small Language Models Fine-tuned to Coordinate Larger Language Models improve Complex Reasoning	Decomposing Complex Problems
Small Models are Valuable Plug-ins for Large Language Models	Generating Pseudo Labels
Can Small Language Models Help Large Language Models Reason Better?: LM-Guided Chain-of-Thought	Generating Pseudo Labels
CaLM: Contrasting Large and Small Language Models to Verify Grounded Generation	Generating Feedback
Small Language Models Improve Giants by Rewriting Their Outputs	Generating Feedback

Deficiency Repair

Developing SM plugins to repair deficiencies:

Title	Topic	Venue	Code
Small Agent Can Also Rock! Empowering Small Language Models as Hallucination Detector	Hallucinations
Reconfidencing LLMs from the Grouping Loss Perspective	Hallucinations
Imputing Out-of-Vocabulary Embeddings with LOVE Makes LanguageModels Robust with Little Cost	Out-Of-Vocabulary Words

Contrasting LLMs and SMs for better generations:

Title	Topic	Venue	Code
Contrastive Decoding: Open-ended Text Generation as Optimization	Reducing Repeated Texts
Alleviating Hallucinations of Large Language Models through Induced Hallucinations	Mitigating Hallucinations
Contrastive Decoding Improves Reasoning in Large Language Models	Augmenting Reasoning Capabilities
CoGenesis: A Framework Collaborating Large and Small Language Models for Secure Context-Aware Instruction Following	Safeguarding Privacy

LLMs Enhance SMs

Knowledge Distillation

Black-box Distillation:

Title	Topic	Venue	Code
Explanations from Large Language Models Make Small Reasoners Better	Chain-Of-Thought Distillation
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes	Chain-Of-Thought Distillation
Distilling Reasoning Capabilities into Smaller Language Models	Chain-Of-Thought Distillation
Teaching Small Language Models to Reason	Chain-Of-Thought Distillation
Symbolic Chain-of-Thought Distillation: Small Models Can Also "Think" Step-by-Step	Chain-Of-Thought Distillation
Specializing Smaller Language Models towards Multi-Step Reasoning	Chain-Of-Thought Distillation
TinyLLM: Learning a Small Student from Multiple Large Language Models	Chain-Of-Thought Distillation
Lion: Adversarial Distillation of Proprietary Large Language Models	Instruction Following Distillation
Selective Reflection-Tuning: Student-Selected Data Recycling for LLM Instruction-Tuning	Instruction Following Distillation

White-box Distillation:

Title	Topic	Venue	Code
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter	Logits
ZeroQuant: Efficient and Affordable Post-Training Quantization for Large-Scale Transformers	Intermediate Features
Less is More: Task-aware Layer-wise Distillation for Language Model Compression	Intermediate Features
MiniLLM: Knowledge Distillation of Large Language Models	Intermediate Features
LLM-QAT: Data-Free Quantization Aware Training for Large Language Models	Intermediate Features

Data Synthesis

Data Augmentation:

Title	Topic	Venue	Code
Improving data augmentation for low resource speech-to-text translation with diverse paraphrasing	Text Paraphrase
Paraphrasing with Large Language Models	Text Paraphrase
Query Rewriting for Retrieval-Augmented Large Language Models	Query Rewriting
LLMvsSmall Model? Large Language Model Based Text Augmentation Enhanced Personality Detection Model	Specific Tasks
Data Augmentation for Intent Classification with Off-the-shelf Large Language Models	Specific Tasks
Weakly Supervised Data Augmentation Through Prompting for Dialogue Understanding	Specific Tasks

Training Data Generation:

Title	Topic	Venue	Code
Want To Reduce Labeling Cost? GPT-3 Can Help	Label Annotation
Self-Guided Noise-Free Data Generation for Efficient Zero-Shot Learning	Label Annotation
ZeroGen: Efficient Zero-shot Learning via Dataset Generation	Dataset Generation
Generating Training Data with Language Models: Towards Zero-Shot Language Understanding	Dataset Generation
Increasing Diversity While Maintaining Accuracy: Text Data Generation with Large Language Models and Human Interventions	Dataset Generation
Synthetic Data Generation with Large Language Models for Text Classification: Potential and Limitations	Dataset Generation
Does Synthetic Data Generation of LLMs Help Clinical Text Mining?	Dataset Generation
Exploiting Asymmetry for Synthetic Training Data Generation: SynthIE and the Case of Information Extraction	Dataset Generation
ToxiGen: A Large-Scale Machine-Generated Dataset for Adversarial and Implicit Hate Speech Detection	Dataset Generation

Competition

Computation-constrained Environment

Task-specific Environment

Interpretability-required Environment

Citation

@misc{chen2024rolesmallmodelsllm,
      title={What is the Role of Small Models in the LLM Era: A Survey}, 
      author={Lihu Chen and Gaël Varoquaux},
      year={2024},
      eprint={2409.06857},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2409.06857}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
analyze_trade_off		analyze_trade_off
imgs		imgs
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

License

tigerchen52/awesome_role_of_small_models

Folders and files

Latest commit

History

Repository files navigation