Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

Takyoung Kim^1,, Kyungjae Lee², Young Rok Jang², Ji Yong Cho^2,3, Gangwoo Kim^4,, Minseok Cho², Moontae Lee^2,5
_{¹University of Illinois Urbana-Champaign, ²LG AI Research, ³Cornell University, ⁴Korea University, ⁵University of Illinois Chicago}
_{^*Work done as a research intern at LG AI Research}

Abstract

Interactions with billion-scale large language models typically yield long-form responses due to their extensive parametric capacities, along with retrieval-augmented features. While detailed responses provide insightful viewpoint of a specific subject, they frequently generate redundant and less engaging content that does not meet user interests. In this work, we focus on the role of query outlining (i.e., selected sequence of queries) in scenarios that users request a specific range of information, namely coverage-conditioned ($C^2$) scenarios. For simulating $C^2$ scenarios, we construct QTree, 10K sets of information-seeking queries decomposed with various perspectives on certain topics. By utilizing QTree, we train QPlanner, a 7B language model generating customized query outlines that follow coverage-conditioned queries. We analyze the effectiveness of generated outlines through automatic and human evaluation, targeting on retrieval-augmented generation (RAG). Moreover, the experimental results demonstrate that QPlanner with alignment training can further provide outlines satisfying diverse user interests.

Resource (QTree)

Train set

# of dataset: 10,580 [LINK]
- Note: There are three more samples than those specified in the paper.
Configuration
- question: Base query ($q_{base}$)
- instruction: Coverage query ($q_{cov}$)
- background: Background query
- intention: Intent operation (include/exclude)
- tree: QTree (a hierarchical set of queries)
- candidates: Three candidate query outlines (i.e., four subqueries from QTree) extracted by LLM

Test set

# of dataset: 300 [LINK]
Configuration
- question: Base query ($q_{base}$)
- instruction: Coverage query ($q_{cov}$)
- background: Background query
- intention: Intent operation (include/exclude)
- tree: QTree (a hierarchical set of queries)

Acknowledgement

Our QTree is based on seed queries from ASQA, Longform, and ExpertQA.
We appreciate 🤗alignment-handbook for providing easy LM training framework!

Citation

@misc{kim2024learning,
      title={Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation}, 
      author={Takyoung Kim and Kyungjae Lee and Young Rok Jang and Ji Yong Cho and Gangwoo Kim and Minseok Cho and Moontae Lee},
      year={2024},
      eprint={2407.01158},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2407.01158}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
prompt		prompt
README.md		README.md
generate_eval.py		generate_eval.py
generate_tree.py		generate_tree.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

Abstract

Resource (QTree)

Train set

Test set

Acknowledgement

Citation

About

Releases

Packages

Languages

youngerous/qtree

Folders and files

Latest commit

History

Repository files navigation

Learning to Explore and Select for Coverage-Conditioned Retrieval-Augmented Generation

Takyoung Kim1,*, Kyungjae Lee2, Young Rok Jang2, Ji Yong Cho2,3, Gangwoo Kim4,*, Minseok Cho2, Moontae Lee2,5 1University of Illinois Urbana-Champaign, 2LG AI Research, 3Cornell University, 4Korea University, 5University of Illinois Chicago *Work done as a research intern at LG AI Research

Abstract

Resource (QTree)

Train set

Test set

Acknowledgement

Citation

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages