-
Notifications
You must be signed in to change notification settings - Fork 731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Reward model cookbook #1332
base: master
Are you sure you want to change the base?
Conversation
There are some usage issues with the Slackbot section that require further discussion with other team members. |
Thanks @Asher-hss , could you add more detail about how will you implement reward model with slack bot? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @Asher-hss ! here is our cookbook guideline FYR https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing-
for the reward model cookbook, i think we can make a more systematic cookbook to showcase generation RLHF data as described in issue:#1216, if just to showcase the usage of reward model, the volumn of generated data could be reduced, and add more content for reward model filtering part, seems the current cookbook only has 20-30% content for reward model part
Thanks @Wendong-Fan,I think the current cookbook can focus solely on demonstrating how to use the reward model, and an additional cookbook can be created to showcase the RLHF training process. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- please check out template, add url link to colab, star repo part etc.
- conclusion part is not added
thresholds = {"helpfulness": 2.5, "correctness": 2.5}
add description why we set thresholds like this, show and explain the filtered data, how well the reward model helped us do data filter
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Asher-hss , left some comments in the colab notebook, please refer to https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing- ,update docs/index.rst and docs/README.md
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Asher-hss , left some comments
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"!pip install \"camel-ai[all]==0.2.12\"" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we update camel version to 0.2.14
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @koch3092,The latest paper mentioned in the previous Colab link has not yet been uploaded to the PR.
"\n", | ||
"from typing import List\n", | ||
"from camel.loaders import Firecrawl\n", | ||
"from camel.models import ModelFactory\n", | ||
"from camel.types import ModelPlatformType, ModelType\n", | ||
"from camel.configs import ChatGPTConfig\n", | ||
"from camel.agents import ChatAgent\n", | ||
"import json\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean up unused code
" alpaca_items = [n_item.item for n_item in\n", | ||
" AlpacaItemResponse.\n", | ||
" model_validate_json(response.msgs[0].content).items]\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
" alpaca_items = [n_item.item for n_item in\n", | |
" AlpacaItemResponse.\n", | |
" model_validate_json(response.msgs[0].content).items]\n", | |
" n_items = AlpacaItemResponse.model_validate_json(response.msgs[0].content).items\n" | |
" alpaca_items =[n_item.item for n_item in n_items]\n", |
Check out this pull request on See visual diffs & provide feedback on Jupyter Notebooks. Powered by ReviewNB |
Description
Add a cookbook about Slackbot and the reward model.
Motivation and Context
Add a cookbook about Slackbot and the reward model.
Types of changes
What types of changes does your code introduce? Put an
x
in all the boxes that apply:Implemented Tasks
Checklist
Go over all the following points, and put an
x
in all the boxes that apply.If you are unsure about any of these, don't hesitate to ask. We are here to help!