Reading List for Vision Language Models
-
Transformers and Attention Mechanisms
Main paper
: Attention is all you needVideo By One of the Authors:
Attention is all you need; Łukasz Kaiser | MasterclassTransformers Family Lilian Weng Blog
An introduction to Transformers Prof. Turner
Neural Machine Translation by jointly learning to align and translate (2014)
precursor to transformer
A survey of transformers (2021)
-
Hands-on Transformer Design
-
-
Light and Efficient Transformers
A practical survey on Faster and lighter Transformer (2021)
Efficient Transformer: A survey (2020)
LoRA: Low-Rank Adaptation of language models (2021)
.....
-
Vision Transformers
.....
-
Diffusion Models
Deep Unsupervised Learning using Nonequilibrium Thermodynamics (2015)
Scalable Diffusion Models with Transformers (2022)
Diffusion Models: A comprehensive survey (2022)
Diffusion models Lilian Weng Blog
Score-Based Generative Modelling Yang Song Blog
highly recommended
Consistency models 2023
recent
Tutorial for Diffusion for Imaging and Vision
new summary!
.....
-
Contrastive Learning (self-supervised learning)
Survey of Contrastive Learning (Sept. 2020)
Simple Framework for Contrastive Learning (2020)
Self-supervised learning: Generative and Contrastive (2021)
Contrastive Learning for NLP (Full Reading list)
This is an exhaustive chronicle of papers on contrastive learning maintained by Rui Zhang et al - highly recommended
-
Reinforcement Learning (Plus: RL in LLMs)
Reinforcement Learning Sutton&Barto
Reinforcement Learning: Bit by Bit
A survey on model-based learning (2022) Fan-Ming et al
Model-based reinforcement Learning: A survey (2022) Thomas et al
Deep Reinforcement Learning (2023)
book draft
Learning to summarize from human feedback (2022)
main:openai
Deep Reinforcement Learning from Human Preferences (2017)
influential
Training language models to follow human instructions
Survey on preference-based reinforcement learning methods (2017)
Scaling law for reward model overoptimization
RLHF Illustrations by hugging face and references
web page - check reference list at end of article
Red Teaming LLMs
Important for sanitizing llm!
...
-
Vision-Language Models
Intro to Vision-Language Modelling
Vision-Language Pre-Training: Basics, Recent Advances, and Future Trends
Foundations & Trends in Multimodal Machine Learning:Principles, Challenges, and Open Questions
VLP: A survey on Vision-Language Pre-training
Vision-Language Pretraining: Current Trends and the Future
IDAP||Deepmind
-
Generalist Agents
Scaling Agent Acrosss many Simulated Worlds (SIMA)
see blog post below
(molecular/drug discovery, genomics, weather, protein structure, disaster response systems, quality control, ...)
Opportunities for use in resource-constrained environments and devices ...