Machine Learning Paper Reviews GPT

Description

This project, developed as a final project for UPenn CIS6200 Advanced Topics in Deep Learning, explores the use of large language models (LLMs) for generating reviews of machine learning papers. It is inspired by "Can large language models provide useful feedback on research papers? A large-scale empirical analysis", utilizing similar techniques. The project consists of two primary components: the review generation pipeline and the evaluation pipeline. The review generation pipeline uses both a fine-tuned model and GPT-3.5-turbo and GPT-4-turbo as baselines, while the evaluation pipeline compares these generated reviews with human ones to assess quality. For a full project description, see our project report.

Review Generation Pipeline

This pipeline includes two methods for generating reviews: fine-tuning and using pre-trained GPT models. The fine-tuning method leverages a specifically fine-tuned model, whereas the GPT method utilizes GPT-3.5-turbo and GPT-4-turbo. Text extraction from PDF files is managed via scipdf_parser.

Baselines

We used GPT-3.5-turbo and GPT-4-turbo as baseline models to generate machine learning paper reviews. Our experiments also explored one-shot learning techniques with these models.

Fine-tuning

We fine-tuned the Mistral-7B-Instruct-v0.2 model on a custom dataset of papers and their reviews from OpenReview. We used GPT-4 to generate summary reviews during the fine-tuning process. Due to computational constraints, the focus was primarily on paper abstracts. The dataset is available in the data directory and on the Huggingface dataset hub here. The fine-tuned model can be accessed on the Huggingface model hub here.

Evaluation Pipeline

We conducted retrospective evaluations comparing the comment overlap in GPT-4 vs. Human and Human vs. Human setups. Metrics such as the Szymkiewicz–Simpson Overlap Coefficient, the Jaccard Index, and the Sørensen–Dice Coefficient were employed, demonstrating that the performance of GPT-4 vs. Human is comparable to Human vs. Human. This highlights the effectiveness of our model across different conditions and datasets.

For detailed evaluation methods, see the evaluation.ipynb notebook.

Usage

The project provides two pipelines for generating reviews:

Model Pipeline: Utilizes the fine-tuned model for review generation.
GPT Pipeline: Generates reviews using GPT-3.5-turbo or GPT-4-turbo.

Installation

Clone the repository and set up the environment:

git clone git@github.com:yinuotxie/MLPapersReviewGPT.git
python3 -m venv env
source env/bin/activate
pip install -r requirements.txt

Note: The scipdf_parser package, required for PDF text extraction, must run within a Docker container. Instructions are available in the scipdf_parser repository.

Model Pipeline

Generate reviews using our fine-tuned model. Currently, only the abstracts of papers are supported:

python model_review.py 
    --pdf_file <path_to_pdf_file> 
    --device <device> 
    --model_id <model_id> 
    --quantize

GPT Pipeline

Alternatively, use the GPT pipeline to generate reviews:

python gpt_review.py 
    --pdf_file <path_to_pdf_file> 
    --openai_api_key <your_openai_api_key> 
    --model <gpt-3.5-turbo or gpt-4-turbo> 
    --method <full or abstract> 
    --one_shot

You can also check the inferece.ipynb notebook for more details.

User Interface (UI)

We provide a UI that helps to visualize and compare results from GPT models and our model. To start the UI, you need to create a .env file that contains OPEN_AI_KEY and HF_TOKEN (huggingface token). To run the app,

python app.py

If there's dependencies unsinstalled, please refer to the documentation included in app.py for more details.

Acknowledgements

We extend our deepest gratitude to our professor, Prof. Lyle Ungar, for his invaluable guidance and support throughout the project. We also thank the teaching assistants, Visweswaran Baskaran, Haotong (Victor) Tian, and Royina Karegoudra Jayanth, for their helpful feedback and assistance.

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
assets		assets
data		data
notebooks		notebooks
.gitignore		.gitignore
Project_Report.pdf		Project_Report.pdf
README.md		README.md
app.py		app.py
evaluate.py		evaluate.py
gpt_review.py		gpt_review.py
metric.py		metric.py
model_review.py		model_review.py
pdf_parser.py		pdf_parser.py
prompts.py		prompts.py
report_demo_example.txt		report_demo_example.txt
requirements.txt		requirements.txt
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Paper Reviews GPT

Description

Review Generation Pipeline

Baselines

Fine-tuning

Evaluation Pipeline

Usage

Installation

Model Pipeline

GPT Pipeline

User Interface (UI)

Acknowledgements

References

About

Releases

Packages

Contributors 5

Languages

yinuotxie/MLPapersReviewGPT

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Paper Reviews GPT

Description

Review Generation Pipeline

Baselines

Fine-tuning

Evaluation Pipeline

Usage

Installation

Model Pipeline

GPT Pipeline

User Interface (UI)

Acknowledgements

References

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages