Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

docs: Reward model cookbook #1332

Open
wants to merge 3 commits into
base: master
Choose a base branch
from
Open

docs: Reward model cookbook #1332

wants to merge 3 commits into from

Conversation

Asher-hss
Copy link
Collaborator

Description

Add a cookbook about Slackbot and the reward model.

Motivation and Context

Add a cookbook about Slackbot and the reward model.

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • Subtask 1
  • Subtask 2
  • Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

@Asher-hss
Copy link
Collaborator Author

There are some usage issues with the Slackbot section that require further discussion with other team members.

@Wendong-Fan
Copy link
Member

Thanks @Asher-hss , could you add more detail about how will you implement reward model with slack bot?

@Wendong-Fan Wendong-Fan added the documentation Improvements or additions to documentation label Dec 18, 2024
@Wendong-Fan Wendong-Fan added this to the Sprint 18 milestone Dec 18, 2024
@Wendong-Fan Wendong-Fan changed the title Slackbot & Reward model cookbook docs: Slackbot & Reward model cookbook Dec 18, 2024
Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @Asher-hss ! here is our cookbook guideline FYR https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing-

for the reward model cookbook, i think we can make a more systematic cookbook to showcase generation RLHF data as described in issue:#1216, if just to showcase the usage of reward model, the volumn of generated data could be reduced, and add more content for reward model filtering part, seems the current cookbook only has 20-30% content for reward model part

@Asher-hss
Copy link
Collaborator Author

thanks @Asher-hss ! here is our cookbook guideline FYR https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing-

for the reward model cookbook, i think we can make a more systematic cookbook to showcase generation RLHF data as described in issue:#1216, if just to showcase the usage of reward model, the volumn of generated data could be reduced, and add more content for reward model filtering part, seems the current cookbook only has 20-30% content for reward model part

Thanks @Wendong-Fan,I think the current cookbook can focus solely on demonstrating how to use the reward model, and an additional cookbook can be created to showcase the RLHF training process.

@Asher-hss Asher-hss changed the title docs: Slackbot & Reward model cookbook docs: Reward model cookbook Dec 19, 2024
Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. please check out template, add url link to colab, star repo part etc.
  2. conclusion part is not added
  3. thresholds = {"helpfulness": 2.5, "correctness": 2.5} add description why we set thresholds like this, show and explain the filtered data, how well the reward model helped us do data filter

@Asher-hss
Copy link
Collaborator Author

Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Asher-hss , left some comments in the colab notebook, please refer to https://github.com/camel-ai/camel/blob/master/CONTRIBUTING.md#contributing-to-the-cookbook-writing- ,update docs/index.rst and docs/README.md

Copy link
Collaborator

@koch3092 koch3092 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Asher-hss , left some comments

"metadata": {},
"outputs": [],
"source": [
"!pip install \"camel-ai[all]==0.2.12\""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we update camel version to 0.2.14?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

Copy link
Collaborator Author

@Asher-hss Asher-hss Dec 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @koch3092,The latest paper mentioned in the previous Colab link has not yet been uploaded to the PR.

Comment on lines 97 to 104
"\n",
"from typing import List\n",
"from camel.loaders import Firecrawl\n",
"from camel.models import ModelFactory\n",
"from camel.types import ModelPlatformType, ModelType\n",
"from camel.configs import ChatGPTConfig\n",
"from camel.agents import ChatAgent\n",
"import json\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

clean up unused code

Comment on lines 148 to 150
" alpaca_items = [n_item.item for n_item in\n",
" AlpacaItemResponse.\n",
" model_validate_json(response.msgs[0].content).items]\n",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
" alpaca_items = [n_item.item for n_item in\n",
" AlpacaItemResponse.\n",
" model_validate_json(response.msgs[0].content).items]\n",
" n_items = AlpacaItemResponse.model_validate_json(response.msgs[0].content).items\n"
" alpaca_items =[n_item.item for n_item in n_items]\n",

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

3 participants