docs: Reward model cookbook #1332

Asher-hss · 2024-12-17T04:54:52Z

Description

Add a cookbook about Slackbot and the reward model.

Motivation and Context

Add a cookbook about Slackbot and the reward model.

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Implemented Tasks

Subtask 1
Subtask 2
Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

Asher-hss · 2024-12-17T04:55:39Z

There are some usage issues with the Slackbot section that require further discussion with other team members.

Wendong-Fan · 2024-12-17T05:30:01Z

Thanks @Asher-hss , could you add more detail about how will you implement reward model with slack bot?

Wendong-Fan

thanks @Asher-hss ! here is our cookbook guideline FYR https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing-

for the reward model cookbook, i think we can make a more systematic cookbook to showcase generation RLHF data as described in issue:#1216, if just to showcase the usage of reward model, the volumn of generated data could be reduced, and add more content for reward model filtering part, seems the current cookbook only has 20-30% content for reward model part

Asher-hss · 2024-12-19T05:22:36Z

thanks @Asher-hss ! here is our cookbook guideline FYR https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing-

for the reward model cookbook, i think we can make a more systematic cookbook to showcase generation RLHF data as described in issue:#1216, if just to showcase the usage of reward model, the volumn of generated data could be reduced, and add more content for reward model filtering part, seems the current cookbook only has 20-30% content for reward model part

Thanks @Wendong-Fan，I think the current cookbook can focus solely on demonstrating how to use the reward model, and an additional cookbook can be created to showcase the RLHF training process.

Wendong-Fan

please check out template, add url link to colab, star repo part etc.
conclusion part is not added
thresholds = {"helpfulness": 2.5, "correctness": 2.5} add description why we set thresholds like this, show and explain the filtered data, how well the reward model helped us do data filter

Asher-hss · 2024-12-25T06:01:20Z

https://colab.research.google.com/drive/15Y4iDw_yeskG7ZXzqy6pnJDVtPjl3RVA

Wendong-Fan

Thanks @Asher-hss , left some comments in the colab notebook, please refer to https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing- ,update docs/index.rst and docs/README.md

koch3092

Thanks @Asher-hss , left some comments

koch3092 · 2024-12-30T08:30:32Z

docs/cookbooks/synthetic_data_evaluation&filter_with_reward_model.ipynb

+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "!pip install \"camel-ai[all]==0.2.12\""


Shall we update camel version to 0.2.14?

Thanks @koch3092，The latest paper mentioned in the previous Colab link has not yet been uploaded to the PR.

koch3092 · 2024-12-30T08:33:28Z

docs/cookbooks/synthetic_data_evaluation&filter_with_reward_model.ipynb

+    "\n",
+    "from typing import List\n",
+    "from camel.loaders import Firecrawl\n",
+    "from camel.models import ModelFactory\n",
+    "from camel.types import ModelPlatformType, ModelType\n",
+    "from camel.configs import ChatGPTConfig\n",
+    "from camel.agents import ChatAgent\n",
+    "import json\n",


clean up unused code

koch3092 · 2024-12-30T08:37:29Z

docs/cookbooks/synthetic_data_evaluation&filter_with_reward_model.ipynb

+    "    alpaca_items = [n_item.item for n_item in\n",
+    "                    AlpacaItemResponse.\n",
+    "                    model_validate_json(response.msgs[0].content).items]\n",


Suggested change

" alpaca_items = [n_item.item for n_item in\n",

" AlpacaItemResponse.\n",

" model_validate_json(response.msgs[0].content).items]\n",

" n_items = AlpacaItemResponse.model_validate_json(response.msgs[0].content).items\n"

" alpaca_items =[n_item.item for n_item in n_items]\n",

review-notebook-app · 2024-12-30T13:33:30Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

update

e61cba9

Wendong-Fan assigned Asher-hss Dec 18, 2024

Wendong-Fan requested review from koch3092, AveryYay and Wendong-Fan December 18, 2024 07:12

Wendong-Fan added the documentation Improvements or additions to documentation label Dec 18, 2024

Wendong-Fan added this to the Sprint 18 milestone Dec 18, 2024

Wendong-Fan changed the title ~~Slackbot & Reward model cookbook~~ docs: Slackbot & Reward model cookbook Dec 18, 2024

Wendong-Fan reviewed Dec 18, 2024

View reviewed changes

Asher-hss changed the title ~~docs: Slackbot & Reward model cookbook~~ docs: Reward model cookbook Dec 19, 2024

update

1bc18ab

Wendong-Fan reviewed Dec 24, 2024

View reviewed changes

Wendong-Fan reviewed Dec 26, 2024

View reviewed changes

koch3092 reviewed Dec 30, 2024

View reviewed changes

update

7b2beb5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: Reward model cookbook #1332

docs: Reward model cookbook #1332

Asher-hss commented Dec 17, 2024

Asher-hss commented Dec 17, 2024

Wendong-Fan commented Dec 17, 2024

Wendong-Fan left a comment

Asher-hss commented Dec 19, 2024

Wendong-Fan left a comment

Asher-hss commented Dec 25, 2024

Wendong-Fan left a comment

koch3092 left a comment

koch3092 Dec 30, 2024

Asher-hss Dec 30, 2024

Asher-hss Dec 30, 2024 •

edited

Loading

koch3092 Dec 30, 2024

koch3092 Dec 30, 2024

review-notebook-app bot commented Dec 30, 2024

docs: Reward model cookbook #1332

Are you sure you want to change the base?

docs: Reward model cookbook #1332

Conversation

Asher-hss commented Dec 17, 2024

Description

Motivation and Context

Types of changes

Implemented Tasks

Checklist

Asher-hss commented Dec 17, 2024

Wendong-Fan commented Dec 17, 2024

Wendong-Fan left a comment

Choose a reason for hiding this comment

Asher-hss commented Dec 19, 2024

Wendong-Fan left a comment

Choose a reason for hiding this comment

Asher-hss commented Dec 25, 2024

Wendong-Fan left a comment

Choose a reason for hiding this comment

koch3092 left a comment

Choose a reason for hiding this comment

koch3092 Dec 30, 2024

Choose a reason for hiding this comment

Asher-hss Dec 30, 2024

Choose a reason for hiding this comment

Asher-hss Dec 30, 2024 • edited Loading

Choose a reason for hiding this comment

koch3092 Dec 30, 2024

Choose a reason for hiding this comment

koch3092 Dec 30, 2024

Choose a reason for hiding this comment

review-notebook-app bot commented Dec 30, 2024

Asher-hss Dec 30, 2024 •

edited

Loading