๐ Awesome lists about all kinds of LLM related papers
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks: This paper introduces a general-purpose fine-tuning recipe for retrieval-augmented generation (RAG) -- models which combine pre-trained parametric and non-parametric memory for language generation.
- Retrieval-Augmented Generation with Knowledge Graphs for Customer Service Question Answering
- Denoising Diffusion Probabilistic Models: This paper introduces a new class of generative models called denoising diffusion probabilistic models (DDPMs).
- Ferret-UI: Grounded Mobile UI Understanding with Multimodal LLMs: Apple's paper on multimodal LLMs for mobile UI understanding.
- Attention is All You Need: This paper introduces the Transformer architecture, which is based on the multi-head attention mechanism.
- Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention: This work introduces an efficient method to scale Transformer-based Large Language Models (LLMs) to infinitely long inputs with bounded memory and computation.
- A Primer on the Inner Workings of Transformer-based Language Models: This paper presents a technical introduction to current techniques used to interpret the inner workings of Transformer-based language models.
- KAN: Kolmogorov-Arnold Networks: Inspired by the Kolmogorov-Arnold representation theorem, the paper propose Kolmogorov-Arnold Networks (KANs) as promising alternatives to Multi-Layer Perceptrons (MLPs)
- A Survey of Large Language Models
- A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
- A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
- Multilingual Large Language Model: A Survey of Resources, Taxonomy and Frontiers
- A Comprehensive Overview of Large Language Models
- RAG and RAU: A Survey on Retrieval-Augmented Language Model in Natural Language Processing
- Spider: A Large-Scale Human-Labeled Dataset for Complex and Cross-Domain Semantic Parsing and Text-to-SQL Task
- Seq2SQL: Generating Structured Queries from Natural Language using Reinforcement Learning
- DIN-SQL: Decomposed In-Context Learning of Text-to-SQL with Self-Correction
- KaggleDBQA: Realistic Evaluation of Text-to-SQL Parsers
- Can LLM Already Serve as A Database Interface? A BIg Bench for Large-Scale Database Grounded Text-to-SQLs
- CodeGemma: Open Code Models Based on Gemma
- Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone
- Code Llama: Open Foundation Models for Code
- Advancing Multimodal Medical Capabilities of Gemini
- An Introduction to Vision-Language Modeling
- Efficient Training of Language Models to Fill in the Middle
- Better & Faster Large Language Models via Multi-token Prediction
- SWE-BENCH: CAN LANGUAGE MODELS RESOLVE REAL-WORLD GITHUB ISSUES?: SWE-bench is an evaluation framework comprising 2,294 software engineering problems from GitHub, designed to test language models on complex code-editing tasks requiring extensive contextual understanding and reasoning.
- GAIA: A Benchmark for General AI Assistants
- AutoDev: Automated AI-Driven Development
- Mobile-Agent-v2: Mobile Device Operation Assistant with Effective Navigation via Multi-Agent Collaboration
- Mixture-of-Agents Enhances Large Language Model Capabilities