Navigate through Enigmatic Labyrinth
A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

Zheng Chu^1∗, Jingchang Chen^1∗, Qianglong Chen^2∗, Weijiang Yu², Tao He¹, Haotian Wang¹, Weihua Peng², Ming Liu^1†, Bing Qin¹, Ting Liu¹

¹Harbin Institute of Technology, Harbin, China

²Huawei Inc., Shenzhen, China

This repository contains the resources for ACL 2024 paper Navigate through Enigmatic Labyrinth, A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future

For more details, please refer to the paper: A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future.

🎉 Updates

2024/06/03 This paper is accepted to ACL2024, camera ready version released.
2023/10/17 The second version of our paper has been released, check it on arxiv.
2023/10/15 We have updated 44 papers in the reading list, and the v2 paper is on its way.
2023/09/27 The first version of our paper is available on arxiv.
2023/09/22 We created this reading list repository.

We use the 💡 icon to identify articles that have been added since the last version of the paper

This reading list will be updated periodically, and if you have any suggestions or find some we missed, feel free to contact us! You can submit an issue or send an email (zchu@ir.hit.edu.cn).

🎁 Resources

Surveys

A Survey of Deep Learning for Mathematical Reasoning, ACL 2023 [paper]
Reasoning with Language Model Prompting: A Survey, ACL 2023 [paper]
A Survey for In-context Learning, arXiv.2301.00234 [paper]
A Survey of Large Language Models, arXiv.2303.18223 [paper]
Nature Language Reasoning, A Survey, arXiv.2303.14725 [paper]
A Survey on Evaluation of Large Language Models, arXiv.2307.03109 [paper] 💡
A Survey on Large Language Model based Autonomous Agents, arXiv.2308.11432 [paper] 💡
Siren’s Song in the AI Ocean: A Survey on Hallucination in Large Language Models, arXiv.2309.01219 [paper] 💡
Multimodal Foundation Models: From Specialists to General-Purpose Assistants, arXiv.2309.10020 [paper] 💡
Towards Better Chain-of-Thought Prompting Strategies: A Survey, arXiv.2310.04959 [paper] 💡
Survey on Factuality in Large Language Models: Knowledge, Retrieval and Domain-Specificity, arXiv.2310.07521 [paper] 💡
The Mystery and Fascination of LLMs: A Comprehensive Survey on the Interpretation and Analysis of Emergent Abilities, arXiv.2311.00237 [paper] 💡
A Survey on Hallucination in Large Language Models: Principles, Taxonomy, Challenges, and Open Questions, arXiv.2311.05232 [paper] 💡

Blogs

How does GPT Obtain its Ability? Tracing Emergent Abilities of Language Models to their Sources, Dec 2022, Yao Fu’s Notion [blog]
Towards Complex Reasoning: the Polaris of Large Language Models, May 2023, Yao Fu’s Notion [blog]
Prompt Engineering, March 2023, Lil’Log [blog]
LLM Powered Autonomous Agents, June 2023, Lil’Log [blog]

Projects

HqWu-HITCS/Awesome-LLM-Survey, [project]
AGI-Edgerunners/LLM-Planning-Papers [project]

💯 Benchmarks

Mathematical Reasoning

Learning to Solve Arithmetic Word Problems with Verb Categorization, EMNLP 2014 [paper]
Parsing Algebraic Word Problems into Equations, TACL 2015 [paper]
Solving General Arithmetic Word Problems, EMNLP 2015 [paper]
MAWPS: A Math Word Problem Repository, NAACL 2016 [paper]
Program Induction by Rationale Generation: Learning to Solve and Explain Algebraic Word Problems, ACL 2017 [paper]
A Diverse Corpus for Evaluating and Developing English Math Word Problem Solvers, ACL 2020 [paper]
Are NLP Models really able to Solve Simple Math Word Problems?, ACL 2021 [paper]
Training Verifiers to Solve Math Word Problems, arXiv.2110.14168 [paper]
PAL: Program-aided Language Models, ICML 2023 [paper]
MathQA: Towards Interpretable Math Word Problem Solving with Operation-Based Formalisms, NAACL 2019 [paper]
DROP: A Reading Comprehension Benchmark Requiring Discrete Reasoning Over Paragraphs, ACL 2019 [paper]
TheoremQA: A Theorem-driven Question Answering dataset, EMNLP 2023 [paper]
TAT-QA: A Question Answering Benchmark on a Hybrid of Tabular and Textual Content in Finance, ACL 2021 [paper]
FinQA: A Dataset of Numerical Reasoning over Financial Data, EMNLP 2021 [paper]
ConvFinQA: Exploring the Chain of Numerical Reasoning in Conversational Finance Question Answering, EMNLP 2022 [paper]
Measuring Mathematical Problem Solving With the MATH Dataset, NeurIPS 2021 [paper]
NumGLUE: A Suite of Fundamental yet Challenging Mathematical Reasoning Tasks, ACL 2022 [paper]
LILA: A Unified Benchmark for Mathematical Reasoning, EMNLP 2022 [paper]
Conic10K: A Challenging Math Problem Understanding and Reasoning Dataset, EMNLP 2023 [paper] 💡

Commonsense Reasoning

Think you have Solved Direct-Answer Question Answering? Try ARC-DA, the Direct-Answer AI@ Reasoning Challenge, arXiv.2102.03315 [paper]
Can a Suit of Armor Conduct Electricity? A New Dataset for Open Book Question Answering, ACL 2018 [paper]
PIQA: Reasoning about Physical Commonsense in Natural Language, AAAI 2020 [paper]
CommonsenseQA: A Question Answering Challenge Targeting Commonsense Knowledge, NAACL 2019 [paper]
CommonsenseQA 2.0: Exposing the Limits of AI through Gamification, NeurIPS 2021 [paper]
Event2Mind: Commonsense Inference on Events, Intents, and Reactions, ACL 2018 [paper]
Going on a vacation" takes longer than "Going for a walk": A Study of Temporal Commonsense Understanding, EMNLP 2019 [paper]
Cosmos QA: Machine Reading Comprehension with Contextual Commonsense Reasoning, EMNLP 2019 [paper]
Does it Make Sense? And Why? A Pilot Study for Sense Making and Explanation, ACL 2019 [paper]
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies, TACL 2021 [paper]
CRoW: Benchmarking Commonsense Reasoning in Real-World Tasks, EMNLP 2023 [paper] 💡

Symbolic Reasoning

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models, arXiv.2206.04615 [paper]
Challenging BIG-Bench Tasks and Whether Chain-of-Thought Can Solve Them, ACL 2023 [paper]

Logical Reasoning

ReClor: A Reading Comprehension Dataset Requiring Logical Reasoning, ICLR 2020 [paper]
LogiQA: A Challenge Dataset for Machine Reading Comprehension with Logical Reasoning, IJCAI 2020 [paper]
ProofWriter: Generating Implications, Proofs, and Abductive Statements over Natural Language, ACL 2021 [paper]
FOLIO: Natural Language Reasoning with First-Order Logic, arXiv.2209.00840 [paper]
Language Models as Inductive Reasoners, arXiv.2212.10923 [paper]
Language Models Are Greedy Reasoners: A Systematic Formal Analysis of Chain-of-Thought, ICLR 2023 [paper]

Multi-modal Reasoning

Visual-Language (Image)

From Recognition to Cognition: Visual Commonsense Reasoning, CVPR 2019 [paper]
VisualCOMET: Reasoning About the Dynamic Context of a Still Image, ICCV 2020 [paper]
Premise-based Multimodal Reasoning: Conditional Inference on Joint Textual and Visual Clues, ACL 2022 [paper]
Learn to Explain: Multimodal Reasoning via Thought Chains for Science Question Answering, NeurIPS 2022 [paper]
Measuring and Improving Chain-of-Thought Reasoning in Vision-Language Models, arxiv.2309.04461 [paper] 💡

Video-Language

What is More Likely to Happen Next? Video-and-Language Future Event Prediction, EMNLP 2020 [paper]
CLEVRER: Collision Events for Video Representation and Reasoning, ICLR 2020 [paper]
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions, CVPR 2021 [paper]
STAR: A Benchmark for Situated Reasoning in Real-World Videos, NeurIPS 2021 [paper]
From Representation to Reasoning: Towards both Evidence and Commonsense Reasoning for Video Question-Answering, CVPR 2022 [paper]
NewsKVQA: Knowledge-Aware News Video Question Answering, PAKDD 2022 [paper]

🚀 Advances

XoT Construction

Manual Construction

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models, NeurIPS 2022 [paper]
PAL: Program-aided Language Models, PMLR 2023 [paper]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arXiv.2211.12588 [paper]
MathPrompter: Mathematical Reasoning using Large Language Models, ACL 2023 [paper]
Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]

Automatic Construction

Large Language Models are Zero-Shot Reasoners, NeurIPS 2022 [paper]
Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arXiv.2211.12588 [paper]
Plan-and-Solve Prompting: Improving Zero-Shot Chain-of-Thought Reasoning by Large Language Models, ACL 2023 [paper]
Automatic Chain of Thought Prompting in Large Language Models, ICLR 2023 [paper]
Reprompting: Automated Chain-of-Thought Prompt Inference Through Gibbs Sampling, arXiv.2305.09993 [paper]
Better Zero-Shot Reasoning with Self-Adaptive Prompting, ACL 2023 [paper] 💡
Enhancing Zero-Shot Chain-of-Thought Reasoning in Large Language Models through Logic, arXiv.2309.13339 [paper] 💡
Agent Instructs Large Language Models to be General Zero-Shot Reasoners, arXiv.2310.03710 [paper] 💡
PromptAgent: Strategic Planning with Language Models Enables Expert-level Prompt Optimization, arXiv.2310.16427 [paper] 💡

Semi-Automatic Construction

Dynamic Prompt Learning via Policy Gradient for Semi-structured Mathematical Reasoning, ICLR 2023 [paper]
Synthetic Prompting: Generating Chain-of-Thought Demonstrations for Large Language Models, arXiv.2302.00618 [paper]
Automatic Prompt Augmentation and Selection with Chain-of-Thought from Labeled Data, EMNLP 2023 [paper]
Explanation Selection Using Unlabeled Data for In-Context Learning, arXiv.2302.04813 [paper]
Boosted Prompt Ensembles for Large Language Models, arXiv.2304.05970 [paper]
Meta-CoT: Generalizable Chain-of-Thought Prompting in Mixed-task Scenarios with Large Language Models, arXiv.2310.06692 [paper] 💡
Self-prompted Chain-of-Thought on Large Language Models for Open-domain Multi-hop Reasoning, EMNLP 2023 [paper] 💡

XoT Structural Variants

Chain Structure

Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks, arXiv.2211.12588 [paper]
PAL: Program-aided Language Models, PMLR 2023 [paper]
Chain-of-Symbol Prompting Elicits Planning in Large Langauge Models, arXiv.2305.10276 [paper]
Automatic Model Selection with Large Language Models for Reasoning, EMNLP 2023 [paper] 💡
Algorithm of Thoughts: Enhancing Exploration of Ideas in Large Language Models, arXiv.2308.10379 [paper]

Tree Structure

Large Language Model Guided Tree-of-Thought, arXiv.2305.08291 [paper]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023 [paper]
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arXiv.2307.15337 [paper]
Tree of Uncertain Thoughts Reasoning for Large Language Models, arXiv.2309.07694 [paper] 💡
Thought Propagation: An Analogical Approach to Complex Reasoning with Large Language Models, arXiv.2310.03965 [paper] 💡
Autonomous Tree-search Ability of Large Language Models, arXiv.2310.10686 [paper] 💡
Probabilistic Tree-of-thought Reasoning for Answering Knowledge-intensive Complex Questions, EMNLP 2023 [paper] 💡

Graph Structure

Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arXiv.2308.09687 [paper]
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arXiv.2308.08614 [paper]
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning, arXiv.2310.03094 [paper] 💡
Resprompt: Residual Connection Prompting Advances Multi-Step Reasoning in Large Language Models, arxix.2310.04743 [paper] 💡

XoT Enhancement Methods

Verify and Refine

Making Language Models Better Reasoners with Step-Aware Verifier, ACL 2022 [paper]
Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]
Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]
Large language models are reasoners with self-verification, arXiv.2212.09561 [paper]
Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023 [paper]
Self-Refine: Iterative Refinement with Self-Feedback, NeurIPS 2023 [paper]
REFINER: Reasoning Feedback on Intermediate Representations, arXiv.2304.01940 [paper]
RCOT: Detecting and Rectifying Factual Inconsistency in Reasoning by Reversing Chain-of-Thought, arXiv.2305.11499 [paper]
Deductive Verification of Chain-of-Thought Reasoning, NeurIPS 2023 [paper]
Forward-Backward Reasoning in Large Language Models for Verification, arXiv.2308.07758 [paper]
SCREWS: A Modular Framework for Reasoning with Revisions, arXiv.2309.13075 [paper] 💡
Chain-of-Verification Reduces Hallucination in Large Language Models, arXiv.2309.11495 [paper] 💡
Large Language Models Cannot Self-Correct Reasoning Yet, arXiv.2310.01798 [paper] 💡
Crystal: Introspective Reasoners Reinforced with Self-Feedback, EMNLP 2023 [paper] 💡
Take a Step Back: Evoking Reasoning via Abstraction in Large Language Models, arXiv.2310.04921 [paper] 💡
Towards Mitigating Hallucination in Large Language Models via Self-Reflection, arXiv.2310.06271 [paper] 💡
MAF: Multi-Aspect Feedback for Improving Reasoning in Large Language Models, EMNLP 2023 [paper] 💡
R^3 Prompting: Review, Rephrase and Resolve for Chain-of-Thought Reasoning in Large Language Models under Noisy Context, EMNLP 2023 [paper] 💡
Ask One More Time: Self-Agreement Improves Reasoning of Language Models in (Almost) All Scenarios, arXiv.2311.08154 [paper] 💡
CRITIC: LARGE LANGUAGE MODELS CAN SELFCORRECT WITH TOOL-INTERACTIVE, arxiv.2305.11738 [paper] 💡

Question Decomposition

Successive Prompting for Decomposing Complex Questions, EMNLP 2022 [paper]
Iteratively Prompt Pre-trained Language Models for Chain of Thought, EMNLP 2022 [paper]
Least-to-Most Prompting Enables Complex Reasoning in Large Language Models, ICLR 2023 [paper]
Decomposed Prompting: A Modular Approach for Solving Complex Tasks, ICLR 2023 [paper]
Binding Language Models in Symbolic Languages, ICLR 2023 [paper]
Large Language Models are Versatile Decomposers: Decomposing Evidence and Questions for Table-based Reasoning, SIGIR 2023 [paper]
The Art of SOCRATIC QUESTIONING: Zero-shot Multimodal Reasoning with Recursive Thinking and Self-Questioning, EMNLP 2023 [paper] 💡
Cumulative Reasoning with Large Language Models, arXiv.2308.04371 [paper] 💡
From Complex to Simple: Unraveling the Cognitive Tree for Reasoning with Small Language Models, EMNLP 2023 [paper] 💡

External Knowledge

Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions, ACL 2023 [paper] 💡
Chain-of-Dictionary Prompting Elicits Translation in Large Language Models, arXiv.2305.06575 [paper]
MoT: Pre-thinking and Recalling Enable ChatGPT to Self-Improve with Memory-of-Thoughts, arXiv.2305.05181 [paper]
Chain of Knowledge: A Framework for Grounding Large Language Models with Structured Knowledge Bases, arXiv.2305.13269 [paper]
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arXiv.2306.06427 [paper]
Knowledge-Driven CoT: Exploring Faithful Reasoning in LLMs for Knowledge-intensive Question Answering, arXiv.2308.13259 [paper]
Chain-of-Note: Enhancing Robustness in Retrieval-Augmented Language Models, arXiv.2311.09210 [paper] 💡
Measuring and Narrowing the Compositionality Gap in Language Models, EMNLP 2023 [paper] 💡
Enhancing Retrieval-Augmented Large Language Models with Iterative Retrieval-Generation Synergy, ENMLP 2023 [paper] 💡
IAG: Induction-Augmented Generation Framework for Answering Reasoning Questions, EMNLP 2023 [paper] 💡
Leveraging Structured Information for Explainable Multi-hop Question Answering and Reasoning, EMNLP 2023 [paper] 💡

Vote and Rank

Training Verifiers to Solve Math Word Problems, arXiv.2110.14168 [paper]
Self-Consistency Improves Chain of Thought Reasoning in Language Models, ICLR 2023 [paper]
Complexity-Based Prompting for Multi-step Reasoning, ICLR 2023 [paper]
Answering Questions by Meta-Reasoning over Multiple Chains of Thought, EMNLP 2023 [paper]
Discriminator-Guided Multi-step Reasoning with Language Models, arXiv.2305.14934 [paper] 💡
SelfCheck: Using LLMs to Zero-Shot Check Their Own Step-by-Step Reasoning, arXiv.2308.00436 [paper]
Diversity of Thought Improves Reasoning Abilities of Large Language Models, arXiv.2310.07088 [paper] 💡
Universal Self-Consistency for Large Language Model Generation, arXiv.2311.17311 [paper] 💡
Cross-lingual Prompting: Improving Zero-shot Chain-of-Thought Reasoning across Languages, EMNLP 2023 [paper] 💡

Efficiency

Active Prompting with Chain-of-Thought for Large Language Models, arXiv.2302.12246 [paper]
Let's Sample Step by Step: Adaptive-Consistency for Efficient Reasoning with LLMs, arXiv.2305.11860 [paper]
Skeleton-of-Thought: Large Language Models Can Do Parallel Decoding, arXiv.2307.15337 [paper]
Draft & Verify: Lossless Large Language Model Acceleration via Self-Speculative Decoding, arXiv.2309.08168 [paper] 💡
Large Language Model Cascades with Mixture of Thoughts Representations for Cost-efficient Reasoning, arXiv.2310.03094 [paper] 💡

🛸 Frontier Application

Tool Using

MRKL Systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning, arXiv.2205.00445 [paper]
TALM: Tool Augmented Language Models, arXiv.2205.12255 [paper]
ReAct: Synergizing Reasoning and Acting in Language Models, ICLR 2023 [paper]
Toolformer: Language Models Can Teach Themselves to Use Tools, NeurIPS 2023 [paper]
ART: Automatic multi-step reasoning and tool-use for large language models, arXiv.2303.09014 [paper] 💡
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action, arXiv.2303.11381 [paper]
HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face, NeurIPS 2023 [paper]
API-Bank: A Benchmark for Tool-Augmented LLMs, EMNLP 2023 [paper]
ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings, NeurIPS 2023 [paper] 💡
ChatCoT: Tool-Augmented Chain-of-Thought Reasoning on Chat-based Large Language Models, EMNLP 2023 [paper] 💡
On the Tool Manipulation Capability of Open-source Large Language Models, arXiv.2305.16504 [paper] 💡
Large Language Models as Tool Makers, arXiv.2305.17126 [paper] 💡
GEAR: Augmenting Language Models with Generalizable and Efficient Tool Resolution, arXiv.2307.08775 [paper] 💡
ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs, arXiv.2307.16789 [paper] 💡
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models, arXiv.2308.00675 [paper] 💡
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback, arXiv.2309.10691 [paper] 💡
Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning, arXiv.2309.10814 [paper] 💡
MetaTool Benchmark: Deciding Whether to Use Tools and Which to Use, arXiv.2310.03128 [paper] 💡
TaskBench: Benchmarking Large Language Models for Task Automation, arXiv.2311.18760 [paper] 💡

Planning

Large Language Models Still Can't Plan (A Benchmark for LLMs on Planning and Reasoning about Change), NeurIPS 2023 [paper] 💡
Reflexion: Language Agents with Verbal Reinforcement Learning, NeurIPS 2023 [paper]
Self-Refine: Iterative Refinement with Self-Feedback, NeurIPS 2023 [paper]
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency, arXiv.2304.11477 [paper]
Large Language Model Guided Tree-of-Thought, arXiv.2305.08291 [paper]
Tree of Thoughts: Deliberate Problem Solving with Large Language Models, NeurIPS 2023 [paper]
Reasoning with Language Model is Planning with World Model, EMNLP 2023 [paper]
On the Planning Abilities of Large Language Models -- A Critical Investigation, NeurIPS 2023 [paper] 💡
AdaPlanner: Adaptive Planning from Feedback with Language Models, NeurIPS 2023 [paper] 💡
Graph of Thoughts: Solving Elaborate Problems with Large Language Models, arXiv.2308.09687 [paper]
Boosting Logical Reasoning in Large Language Models through a New Framework: The Graph of Thought, arXiv.2308.08614 [paper]
Dynamic Planning with a LLM, arXiv.2308.06391 [paper]
ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning, arXiv.2308.13724 [paper] 💡
TPTU: Task Planning and Tool Usage of Large Language Model-based AI Agents, arXiv.2308.03427 [paper] 💡
You Only Look at Screens: Multimodal Chain-of-Action Agents, arXiv.2309.11436 [paper] 💡
Reason for Future, Act for Now: A Principled Framework for Autonomous LLM Agents with Provable Sample Efficiency, arXiv.2309.17382 [paper] 💡
ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving, arXiv.2309.17452 [paper] 💡
Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models, arXiv.2310.04406 [paper] 💡
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models, arXiv.2310.08582 [paper] 💡
Reverse Chain: A Generic-Rule for LLMs to Master Multi-API Planning, arXiv.2310.04474 [paper] 💡
Eliminating Reasoning via Inferring with Planning: A New Framework to Guide LLMs' Non-linear Thinking, arXiv.2310.12342 [paper] 💡
ToolChain*: Efficient Action Space Navigation in Large Language Models with A Search*, arXiv.2310.13227 [paper] 💡
Plan, Verify and Switch: Integrated Reasoning with Diverse X-of-Thoughts, EMNLP 2023 [paper] 💡
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems, arXiv.2311.11315 [paper] 💡
LINC: A Neurosymbolic Approach for Logical Reasoning by Combining Language Models with First-Order Logic Provers, EMNLP 2023 [paper] 💡
Logic-LM: Empowering Large Language Models with Symbolic Solvers for Faithful Logical Reasoning, EMNLP 2023 [paper] 💡
SATLM: Satisfiability-Aided Language Models Using Declarative Prompting, NeurIPS 2023 [paper] 💡

Distillation

STaR: Bootstrapping Reasoning With Reasoning, NeurIPS 2022 [paper]
Large Language Models Can Self-Improve, EMNLP 2023 [paper]
Teaching Small Language Models to Reason, ACL 2023 [paper]
Large Language Models Are Reasoning Teachers, ACL 2023 [paper]
Symbolic Chain-of-Thought Distillation: Small Models Can Also “Think” Step-by-Step, ACL 2023 [paper]
SCOTT: Self-Consistent Chain-of-Thought Distillation, ACL 2023 [paper]
Specializing Smaller Language Models towards Multi-Step Reasoning, ICML 2023 [paper]
Distilling Step-by-Step! Outperforming Larger Language Models with Less Training Data and Smaller Model Sizes, arXiv.2305.02301 [paper]
Contrastive Decoding: Open-ended Text Generation as Optimizatio, ACL 2023 [paper]
Contrastive Decoding Improves Reasoning in Large Language Models, arXiv.2309.09117 [paper]
Chain-of-Thought Reasoning is a Policy Improvement Operator, arXiv.2309.08589 [paper] 💡
Design of Chain-of-Thought in Math Problem Solving, arXiv.2309.11054 [paper] 💡
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models, arXiv.2310.05074 [paper] 💡
Guiding Language Model Reasoning with Planning Tokens, arXiv.2310.05707 [paper] 💡
Democratizing Reasoning Ability: Tailored Learning from Large Language Model, EMNLP 2023 [paper] 💡
MCC-KD: Multi-CoT Consistent Knowledge Distillation, EMNLP 2023 [paper] 💡
Teaching Language Models to Self-Improve through Interactive Demonstrations, arXiv.2310.13522 [paper] 💡
Implicit Chain of Thought Reasoning via Knowledge Distillation, arXiv.2311.01460 [paper] 💡

🔭 Future Prospect

Multi-modal XoT

Multimodal Chain-of-Thought Reasoning in Language Models, arXiv.2302.00923 [paper]
Beyond Chain-of-Thought, Effective Graph-of-Thought Reasoning in Large Language Models, arXiv.2305.16582 [paper]
T-SciQ: Teaching Multimodal Chain-of-Thought Reasoning via Large Language Model Signals for Science Question Answering, arXiv.2305.03453 [paper]
Thinking Like an Expert:Multimodal Hypergraph-of-Thought (HoT) Reasoning to boost Foundation Modals, arXiv.2308.06207 [paper]
Tree-of-Mixed-Thought: Combining Fast and Slow Thinking for Multi-hop Visual Reasoning, arXiv.2308.0965 [paper]

Faithful XoT

Verify-and-Edit: A Knowledge-Enhanced Chain-of-Thought Framework, ACL 2023 [paper]
Rethinking with Retrieval: Faithful Large Language Model Inference, arXiv.2301.00303 [paper]
How language model hallucinations can snowball, arXiv.2305.13534 [paper] 💡
Faithful Chain-of-Thought Reasoning, arXiv.2301.13379 [paper]
Boosting Language Models Reasoning with Chain-of-Knowledge Prompting, arXiv.2306.06427 [paper]
Question Decomposition Improves the Faithfulness of Model-Generated Reasoning, arXiv.2307.11768 [paper]
Measuring Faithfulness in Chain-of-Thought Reasoning, arXiv.2307.13702 [paper]
Chain of Natural Language Inference for Reducing Large Language Model Ungrounded Hallucinations, arXiv.2310.03951 [paper] 💡
Teaching Language Models to Hallucinate Less with Synthetic Tasks, arXiv:2310.06827 [paper] 💡

CoT Theory

Text and Patterns: For Effective Chain of Thought, It Takes Two to Tango, arXiv.2209.07686 [paper]
Language Models of Code are Few-Shot Commonsense Learners, EMNLP 2022 [paper] 💡
Towards Understanding Chain-of-Thought Prompting: An Empirical Study of What Matters, ACL 2023 [paper]
Why think step by step? Reasoning emerges from the locality of experience, arXiv.2304.03843 [paper] 💡
Exploring the Curious Case of Code Prompts, arXiv.2304.13250 [paper] 💡
Large Language Models are In-Context Semantic Reasoners rather than Symbolic Reasoners, arXiv.2305.14825 [paper]
Dissecting Chain-of-Thought: A Study on Compositional In-Context Learning of MLPs, arXiv.2305.18869 [paper]
Towards Revealing the Mystery behind Chain of Thought: a TheoreticalPerspective, NeurIPS 2023 [paper]
Analyzing Chain-of-Thought Prompting in Large Language Models via Gradient-based Feature Attributions, arXiv.2307.13339 [paper]
The Expressive Power of Transformers with Chain of Thought, arXiv.2310.07923 [paper] 💡
Why Can Large Language Models Generate Correct Chain-of-Thoughts?, arXiv.2310.13571 [paper] 💡
Towards a Mechanistic Interpretation of Multi-Step Reasoning Capabilities of Language Models, EMNLP 2023 [paper] 💡

🚢 Other works

The Unreliability of Explanations in Few-Shot In-Context Learning, arXiv.2205.03401 [paper]
A Dataset and Benchmark for Automatically Answering and Generating Machine Learning Final Exams, arXiv.2206.05442 [paper]
Rationale-Augmented Ensembles in Language Models, arXiv.2207.00747 [paper]
Can language models learn from explanations in context?, EMNLP 2022 [paper]
Inferring Implicit Relations in Complex Questions with Language Models, EMNLP 2022 [paper]
Language Models of Code are Few-Shot Commonsense Learners, EMNLP 2022 [paper]
Solving Quantitative Reasoning Problems with Language Models, NeurIPS 2022 [paper]
JiuZhang: A Chinese Pre-trained Language Model for Mathematical Problem Understanding, SIGKDD 2022 [paper]
Large Language Models are few(1)-shot Table Reasoners, EACL 2023 [paper]
Reasoning Implicit Sentiment with Chain-of-Thought Prompting, ACL 2023 [paper]
Element-aware Summarization with Large Language Models: Expert-aligned Evaluation and Chain-of-Thought Method, ACL 2023 [paper]
Tab-CoT: Zero-shot Tabular Chain of Thought, ACL 2023 [paper]
Recursion of Thought: A Divide-and-Conquer Approach to Multi-Context Reasoning with Language Models, ACL 2023 [paper]
Language models are multilingual chain-of-thought reasoners, ICLR 2023 [paper]
Ask Me Anything: A simple strategy for prompting language models, ICLR 2023 [paper]
Large Language Models Can Be Easily Distracted by Irrelevant Context, ICLR 2023 [paper]
CoT-BERT: Enhancing Unsupervised Sentence Representation through Chain-of-Thought, arXiv.2309.11143 [paper] 💡
Three Questions Concerning the Use of Large Language Models to Facilitate Mathematics Learning, EMNLP 2023 [paper] 💡

📝 Citation

If you find our work helpful, you can cite this paper as:

@inproceedings{chuCoTReasoningSurvey2024,
    title={Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future},
    author={Zheng Chu and Jingchang Chen and Qianglong Chen and Weijiang Yu and Tao He and Haotian Wang and Weihua Peng and Ming Liu and Bing Qin and Ting Liu},
    booktitle={The 62nd Annual Meeting of the Association for Computational Linguistics: ACL 2024, Bangkok, Thailand, August 11–16, 2024},
    publisher={Association for Computational Linguistics},
    year={2024},
    url={https://arxiv.org/abs/2309.15402}
}

Files

README.md

Latest commit

History