feat:omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） #1280

zjrwtx · 2024-12-05T09:05:01Z

…models||first experimental version

Description

add example：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM）
reference：https://github.com/openreasoner/openr/tree/main/data

Motivation and Context

Why is this change required? What problem does it solve?
If it fixes an open issue, please link to the issue here.
You can use the syntax close #15213 if this solves the issue #15213

I have raised an issue to propose this change (required for new features and bug fixes)

Types of changes

What types of changes does your code introduce? Put an x in all the boxes that apply:

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of example)

Implemented Tasks

Subtask 1
Subtask 2
Subtask 3

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

I have read the CONTRIBUTION guide. (required)
My change requires a change to the documentation.
I have updated the tests accordingly. (required for a bug fix or a new feature)
I have updated the documentation accordingly.

Wendong-Fan

Thanks @zjrwtx , could we refactor this PR to move the core modules under camel folder? Just like how you did to the O1 data gen PR

zjrwtx · 2024-12-14T16:05:43Z

Thanks @zjrwtx , could we refactor this PR to move the core modules under camel folder? Just like how you did to the O1 data gen PR

yeah sure thanks！

willshang76

Please follow Camel code style by adding comments to all the classes and functions. For functions, please add types to the parameters and indicate the returned type as well. Thanks!

willshang76 · 2024-12-16T07:53:09Z

examples/omegaPRM_openR/gen_data.py

+
+
+def load_config(config_path):
+    """


please update all the comments format by using r""".

willshang76 · 2024-12-16T07:57:07Z

examples/omegaPRM_openR/model_utils.py

+load_dotenv()
+
+
+class LM:


Please add comment to explain the purpose of the class by following the same format of any existing camel class.

willshang76 · 2024-12-16T08:02:57Z

examples/omegaPRM_openR/model_utils.py

+For each step:
+1. Write down what you're calculating
+2. Show the calculation
+3. Explain the result
+Always show your work, even for simple calculations.
+End your solution with the final numerical answer.''',


Please format the code by using intents.

willshang76 · 2024-12-16T08:03:22Z

examples/omegaPRM_openR/model_utils.py

+Always show your work, even for simple calculations.
+End your solution with the final numerical answer.''',
+            model=self.model,
+            message_window_size=10,


Why do you limit the window size to 10?

willshang76 · 2024-12-16T08:05:13Z

examples/omegaPRM_openR/model_utils.py

+Current solution steps:
+{partial_answer}
+Continue the solution, showing all steps and calculations. 
+Make sure to explain each step:"""
+            else:
+                prompt = f"""Problem: {question}
+Please solve this step by step, showing all calculations and
+ explaining each step.
+Remember to:
+1. Break down the problem
+2. Show all calculations
+3. Explain each step
+4. End with the final numerical answer."""


Please format the code using intents.

willshang76 · 2024-12-16T08:06:31Z

examples/omegaPRM_openR/module.py

+from omegaprm_v2 import OmegaPRMV2
+
+
+class Node:


Please add comments for the class and all the functions.

willshang76 · 2024-12-16T08:15:30Z

Please fix all the failing tests. You could run pre-commit run --all-files locally to check the tests. documentation

harryeqs · 2024-12-16T08:28:24Z

examples/omegaPRM_openR/gen_data.py

+def main():
+    # Direct configuration instead of loading from yaml
+    config = {
+        'input': {'json_file_path': 'example_problems.json'},
+        'output': {
+            'file_prefix': 'example',
+            'log_file_path': 'example_processing.log',
+        },
+        'processing': {
+            'initial_rollouts': 30,  # 增加初始rollouts数量
+            'num_rollouts': 25,  # 增加每次迭代的rollouts数量
+            'max_iterations': 150,  # 增加最大迭代次数
+        },
+        'model': {
+            'model_type': 'camel',
+            'model_name': 'deepseek-chat',
+            'model_args': {
+                'max_tokens': 300,  # 增加最大token数
+                'temperature_range': [0.6, 0.9],  # 调整温度范围,增加多样性
+            },
+        },
+    }


Hi Yifeng please comment in English.

harryeqs

Thanks Yifeng! Shall we include a reference for code migrated from the openR repo?

zjrwtx · 2024-12-16T11:33:09Z

@willshang76 @harryeqs thanks！i will refactor this

AveryYay · 2024-12-17T20:10:39Z

examples/omegaPRM_openR/search_tree.py

+        self.latest_id_per_rollout[rollout_key] = unique_id
+        heapq.heappush(self.heap, entry)
+
+    def pop(self) -> Tuple[Optional[State], Optional[str]]:


It seems like it checks whether each unique_id in the heap is still valid, which might increase the computational overhead when repeatedly accessing and modifying the heap. Any chance we could optimize this? Maybe instead of deleting invalid entries immediately, mark them as "invalid" and skip them during the pop() process? We could probably also clean up periodically invalidated entries from entry_finder to prevent memory bloat?

zjrwtx requested a review from Wendong-Fan December 5, 2024 09:05

zjrwtx mentioned this pull request Dec 5, 2024

[Feature Request] omegaPRM reproduced by openR: data used to synthesize process reward models #1279

Open

2 tasks

zjrwtx self-assigned this Dec 5, 2024

Wendong-Fan added this to the Sprint 18 milestone Dec 5, 2024

zjrwtx changed the title ~~omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version~~ （draft）omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version Dec 6, 2024

zjrwtx marked this pull request as draft December 6, 2024 14:07

zjrwtx added the Data Related to camel data processing label Dec 6, 2024

add examples

12ec2ec

zjrwtx force-pushed the omegaPRM_openR branch from 6ad5d05 to 12ec2ec Compare December 13, 2024 11:45

refator to v2 version

dcd18b7

zjrwtx changed the title ~~（draft）omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version~~ omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version Dec 13, 2024

zjrwtx changed the title ~~omegaPRM reproduced by openR: data used to synthesize process reward models||first experimental version~~ add example：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） Dec 13, 2024

zjrwtx changed the title ~~add example：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM）~~ add example：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM）||experimental demo Dec 13, 2024

ready to review the example

60d95d3

zjrwtx marked this pull request as ready for review December 13, 2024 13:32

Wendong-Fan reviewed Dec 14, 2024

View reviewed changes

Wendong-Fan requested review from harryeqs, willshang76 and AveryYay December 14, 2024 17:41

Wendong-Fan linked an issue Dec 15, 2024 that may be closed by this pull request

[Feature Request] omegaPRM reproduced by openR: data used to synthesize process reward models #1279

Open

2 tasks

Merge branch 'master' into omegaPRM_openR

59293d1

willshang76 reviewed Dec 16, 2024

View reviewed changes

harryeqs reviewed Dec 16, 2024

View reviewed changes

zjrwtx marked this pull request as draft December 16, 2024 11:32

AveryYay reviewed Dec 17, 2024

View reviewed changes

Wendong-Fan marked this pull request as ready for review December 18, 2024 08:12

Merge branch 'master' into omegaPRM_openR

b57d759

zjrwtx marked this pull request as draft December 30, 2024 05:55

Merge branch 'master' into omegaPRM_openR

1f90dbc

zjrwtx changed the title ~~add example：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM）||experimental demo~~ feat：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） Jan 2, 2025

zjrwtx changed the title ~~feat：omegaPRM reproduced by openR: Process-supervision Data Generation（PRM）~~ feat:omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat:omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） #1280

feat:omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） #1280

zjrwtx commented Dec 5, 2024 •

edited

Loading

Wendong-Fan left a comment

zjrwtx commented Dec 14, 2024

willshang76 left a comment

willshang76 Dec 16, 2024

willshang76 Dec 16, 2024

willshang76 Dec 16, 2024

willshang76 Dec 16, 2024

willshang76 Dec 16, 2024

willshang76 Dec 16, 2024

willshang76 commented Dec 16, 2024

harryeqs Dec 16, 2024

harryeqs left a comment

zjrwtx commented Dec 16, 2024

AveryYay Dec 17, 2024

feat:omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） #1280

Are you sure you want to change the base?

feat:omegaPRM reproduced by openR: Process-supervision Data Generation（PRM） #1280

Conversation

zjrwtx commented Dec 5, 2024 • edited Loading

Description

Motivation and Context

Types of changes

Implemented Tasks

Checklist

Wendong-Fan left a comment

Choose a reason for hiding this comment

zjrwtx commented Dec 14, 2024

willshang76 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

willshang76 commented Dec 16, 2024

Choose a reason for hiding this comment

harryeqs left a comment

Choose a reason for hiding this comment

zjrwtx commented Dec 16, 2024

Choose a reason for hiding this comment

zjrwtx commented Dec 5, 2024 •

edited

Loading