Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM) #1280

Draft
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

zjrwtx
Copy link
Collaborator

@zjrwtx zjrwtx commented Dec 5, 2024

…models||first experimental version

Description

add example:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM)
reference:https://github.com/openreasoner/openr/tree/main/data

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

  • I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds core functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (update in the documentation)
  • Example (update in the folder of example)

Implemented Tasks

  • Subtask 1
  • Subtask 2
  • Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

  • I have read the CONTRIBUTION guide. (required)
  • My change requires a change to the documentation.
  • I have updated the tests accordingly. (required for a bug fix or a new feature)
  • I have updated the documentation accordingly.

@zjrwtx zjrwtx requested a review from Wendong-Fan December 5, 2024 09:05
@zjrwtx zjrwtx self-assigned this Dec 5, 2024
@Wendong-Fan Wendong-Fan added this to the Sprint 18 milestone Dec 5, 2024
@zjrwtx zjrwtx changed the title omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version (draft)omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version Dec 6, 2024
@zjrwtx zjrwtx marked this pull request as draft December 6, 2024 14:07
@zjrwtx zjrwtx added the Data Related to camel data processing label Dec 6, 2024
@zjrwtx zjrwtx changed the title (draft)omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version Dec 13, 2024
@zjrwtx zjrwtx changed the title omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version add example:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM) Dec 13, 2024
@zjrwtx zjrwtx changed the title add example:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM) add example:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM)||experimental demo Dec 13, 2024
@zjrwtx zjrwtx marked this pull request as ready for review December 13, 2024 13:32
Copy link
Member

@Wendong-Fan Wendong-Fan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @zjrwtx , could we refactor this PR to move the core modules under camel folder? Just like how you did to the O1 data gen PR

@zjrwtx
Copy link
Collaborator Author

zjrwtx commented Dec 14, 2024

Thanks @zjrwtx , could we refactor this PR to move the core modules under camel folder? Just like how you did to the O1 data gen PR

yeah sure thanks!

Copy link
Collaborator

@willshang76 willshang76 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please follow Camel code style by adding comments to all the classes and functions. For functions, please add types to the parameters and indicate the returned type as well. Thanks!



def load_config(config_path):
"""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please update all the comments format by using r""".

load_dotenv()


class LM:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comment to explain the purpose of the class by following the same format of any existing camel class.

Comment on lines +55 to +60
For each step:
1. Write down what you're calculating
2. Show the calculation
3. Explain the result
Always show your work, even for simple calculations.
End your solution with the final numerical answer.''',
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the code by using intents.

Always show your work, even for simple calculations.
End your solution with the final numerical answer.''',
model=self.model,
message_window_size=10,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do you limit the window size to 10?

Comment on lines +79 to +91
Current solution steps:
{partial_answer}
Continue the solution, showing all steps and calculations.
Make sure to explain each step:"""
else:
prompt = f"""Problem: {question}
Please solve this step by step, showing all calculations and
explaining each step.
Remember to:
1. Break down the problem
2. Show all calculations
3. Explain each step
4. End with the final numerical answer."""
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please format the code using intents.

from omegaprm_v2 import OmegaPRMV2


class Node:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add comments for the class and all the functions.

@willshang76
Copy link
Collaborator

Please fix all the failing tests. You could run pre-commit run --all-files locally to check the tests. documentation

Comment on lines +85 to +106
def main():
# Direct configuration instead of loading from yaml
config = {
'input': {'json_file_path': 'example_problems.json'},
'output': {
'file_prefix': 'example',
'log_file_path': 'example_processing.log',
},
'processing': {
'initial_rollouts': 30, # 增加初始rollouts数量
'num_rollouts': 25, # 增加每次迭代的rollouts数量
'max_iterations': 150, # 增加最大迭代次数
},
'model': {
'model_type': 'camel',
'model_name': 'deepseek-chat',
'model_args': {
'max_tokens': 300, # 增加最大token数
'temperature_range': [0.6, 0.9], # 调整温度范围,增加多样性
},
},
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Yifeng please comment in English.

Copy link
Collaborator

@harryeqs harryeqs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Yifeng! Shall we include a reference for code migrated from the openR repo?

@zjrwtx zjrwtx marked this pull request as draft December 16, 2024 11:32
@zjrwtx
Copy link
Collaborator Author

zjrwtx commented Dec 16, 2024

@willshang76 @harryeqs thanks!i will refactor this

self.latest_id_per_rollout[rollout_key] = unique_id
heapq.heappush(self.heap, entry)

def pop(self) -> Tuple[Optional[State], Optional[str]]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like it checks whether each unique_id in the heap is still valid, which might increase the computational overhead when repeatedly accessing and modifying the heap. Any chance we could optimize this? Maybe instead of deleting invalid entries immediately, mark them as "invalid" and skip them during the pop() process? We could probably also clean up periodically invalidated entries from entry_finder to prevent memory bloat?

@Wendong-Fan Wendong-Fan marked this pull request as ready for review December 18, 2024 08:12
@zjrwtx zjrwtx marked this pull request as draft December 30, 2024 05:55
@zjrwtx zjrwtx changed the title add example:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM)||experimental demo feat:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM) Jan 2, 2025
@zjrwtx zjrwtx changed the title feat:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM) feat:omegaPRM reproduced by openR: Process-supervision Data Generation(PRM) Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Data Related to camel data processing
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

[Feature Request] omegaPRM reproduced by openR: data used to synthesize process reward models
5 participants