Mirror-Consistency

Overview

This project evaluates the performance of different large language models (LLMs) on the Mirror-Consistency metric using various datasets. The experiments are conducted using four major LLMs and can be customized by altering the model or dataset configurations.

This work has been accepted to EMNLP 2024 as Short Findings.

Models

We have utilized four LLMs for our experiments:

gpt3.5-turbo-0613 - For using this model, provide the corresponding API by replacing the _gpt35_api function in model.py.
qwen-turbo - This model also requires the corresponding API which should be replaced in the _qwen_turbo_api function in model.py.
Llama3-8B-Instruct - To use this model, download the Hugging Face version of the model parameters and set the model_path parameter in run.py to the path of the downloaded model weights.
Llama3-70B-Instruct - Similar to the Llama3-8B model, ensure to download and correctly reference the model parameters in run.py.

To switch between these models, modify the config.model_name in run.py.

Datasets

Our experiments utilize the following datasets:

GSM8K: A dataset of grade-school math word problems to test arithmetic reasoning.
SVAMP: A dataset designed to test the robustness of mathematical problem-solving models.
Date Understanding: A dataset focusing on the comprehension of date and time expressions in natural language.
StrategyQA: A question-answering dataset that requires multi-hop reasoning and strategy.

To use a different dataset, update the config.dataset_name in run.py.

Running Experiments

For Mirror-Consistency experiments:

Set Up the Model: Follow the instructions in the Models section to configure the desired model.
Modify Parameters: Adjust run.py with the desired model and dataset parameters.
Execute Script: Run python run.py directly to start the experiments. We also provide tools for detailed performance analysis in complete_evaluate.py.

Additional Resources

check_pipeline.ipynb: A Jupyter notebook that serves as a simple example of the generation process using the configured models.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data		data
prompt_template		prompt_template
ReadMe.md		ReadMe.md
check_pipeline.ipynb		check_pipeline.ipynb
complete_evaluate.py		complete_evaluate.py
model.py		model.py
run.py		run.py
simple_evaluate.py		simple_evaluate.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Mirror-Consistency

Overview

Models

Datasets

Running Experiments

Additional Resources

About

Releases

Packages

Languages

LUMIA-Group/Mirror-Consistency

Folders and files

Latest commit

History

Repository files navigation

Mirror-Consistency

Overview

Models

Datasets

Running Experiments

Additional Resources

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages