ChemLit-QA

A comprehensive, expert-validated dataset comprising over 1,000 entries specifically designed for the field of chemistry. The dataset was curated by an initial generation and filtering of a QAC dataset using an automated framework based on GPT-4, followed by rigorous evaluation by Chemistry experts. Additionally, we provide two supplementary datasets ChemLit-QA-neg focused on negative data, and ChemLit-QA-multi focused on multihop reasoning tasks for LLMs, further enhancing the resources available for advanced scientific research.

We provide the full ChemLit-QA dataset and its variants in this repository, as well as the exact train-test split of ChemLit-QA that was used in the fine-tuning task.

Overview

Dataset description

Field	Description
chunk	The text chunk from which the Question-Answer-Context (QAC) triple is generated.
Reasoning_type	Expert-corrected reasoning type. Includes 7 categories: Explanatory, Comparative, Causal, Conditional, Analogical, Evaluative, Predictive
Question	LLM-generated question
Answer	Expert-corrected answer
Difficulty	Expert-assigned difficulty. Includes 3 categories: Easy, Medium, Hard
Context	Expert-corrected context. Contains the full sentences that supports the answer.
A_start_end	The start-end indices of the answer (most similar sentences) in the chunk
similar_chunks	The top 6 most similar chunks to the given chunk in terms of cosine similarity
Cluster_labels	2-level hierarchical label describing the topic of this chunk
ID	Identifier of the entry
Answer Relevancy Scores_gpt-4o	How relevant the answer is to the question, assessd by GPT-4o
Faithfulness Scores_gpt-4o	How faithful the answer is to the context, assessd by GPT-4o
Hallucination Scores_gpt-4o	How much information in the answer is not mentioned in the context, assessd by GPT-4o
Question Faithfulness Scores_gpt-4o	How faithful the question is to the context, assessd by GPT-4o
SE_penalized	Penalized Semantic Entropy of the question
Keywords	Keywords of the question

Dataset quality as per LLM-based metrics

Metric	Mean ± std. dev
Answer Relevancy Score (GPT-4o)	0.99 ± 0.02
Faithfulness Score (GPT-4o)	0.99 ± 0.01
Hallucination Score (GPT-4o)	0.0 ± 0.0
Question Faithfulness Score (GPT-4o)	0.93 ± 0.10
Penalized semantic entropy (GPT-4o)	0.20 ± 0.44

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
ChemLit-QA-Main		ChemLit-QA-Main
ChemLit-QA-multi		ChemLit-QA-multi
ChemLit-QA-neg		ChemLit-QA-neg
assets		assets
figure_scripts		figure_scripts
figures		figures
finetune_datasets		finetune_datasets
LICENSE		LICENSE
README.md		README.md
prompt.py		prompt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ChemLit-QA

Overview

Dataset description

Dataset quality as per LLM-based metrics

About

Releases

Packages

Contributors 3

Languages

License

geemi725/ChemLit-QA

Folders and files

Latest commit

History

Repository files navigation

ChemLit-QA

Overview

Dataset description

Dataset quality as per LLM-based metrics

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages