This is a simple implementation of RLHF based on the paper "Learning to summarize from human feedback" for Toronto Metropolitan University , DS8008 Natural Language Processing Course as part of the its Data Science Master's (MSc) program.
The original raw source of the data used for this experiment comes from Reddit Posts from the below links,
For this experiment due to infrastructure limitations we used the small version of the preprocessed data from Google,
- Preference dataset: gs://vertex-ai/generative-ai/rlhf/text_small/summarize_from_feedback_tfds/comparisons/train/*.jsonl(Stored as datasets/preference_dataset.jsonl)
- Prompt dataset: gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/train/*.jsonl (Stored as datasets/prompt_dataset.jsonl)
- Test/Validation dataset: gs://vertex-ai/generative-ai/rlhf/text_small/reddit_tfds/val/*.jsonl (Stored as datasets/validate_dataset.jsonl)
These datasets are downloaded and stored under datasets/ folder.
This project is not implemented to run on local machine.It is implemented for Google Cloud Platform(GCP),specify to run in Vertex AI.Follow the below steps to execute this project.
- Place the GCP key file under keys/ folder(This is required to authenticate with GCP Project where we want to run this experiment)
- Open the nlp_rlhf_project.ipynb file and follow the Instructions.
- Please note running this notebook will incur cost.(Please budget approx 400-600CAD) and will take approx 1 day 4 hours to complete the pipeline run based on the current settings.
- Learning to summarize from human feedback link(Base Paper)
- Secrets of RLHF in Large Language Models, Secrets of RLHF in Large Language Models Part I: PPO link
- Secrets of RLHF in Large Language Models, Part II: Reward Modeling link
- Tutorial Reinforment Learning from Human Feedback(Code Implementation) link
- Google Cloud RLHFlink
- Wangchunshu Zhou, Ke Xu, "Learning to compare for better training and evaluation of open domain natural language generation models", 2020, link
- Daniel M. Ziegler, Nisan Stiennon, Jeffrey Wu Tom B. Brown Alec Radford Dario Amodei Paul Christiano Geoffrey Irving, "Fine-tuning language models from human preferences", 2020, link