You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi! Thanks for this amazing work and code open-sourcing!
Quick question - we're evaluating our SFT models and noticed our obtained reward values are quite different from Figure 4(a) in your paper. Since you mentioned "normalized rewards" in the figure caption, could you share what normalization method was used?
Thank you for your interest in our work! Yes, we use some normalization configuration from the training dataset. I used the following configuration (mean and std) for the harmless (R1)-helpful (R2) experiments: R1 (-0.94732502, 1.92034349), R2: (-0.01500361, 1.40736504).
For the training set, we use all the train set from hh-rlhf like this: load_dataset(hhrlhf_dataset_path, split='train'). For eval set,
we use all the test set: load_dataset(hhrlhf_dataset_path, split='test'). You can check whether the read-team data is included when using the above loading strategy.
Hi! Thanks for this amazing work and code open-sourcing!
Quick question - we're evaluating our SFT models and noticed our obtained reward values are quite different from Figure 4(a) in your paper. Since you mentioned "normalized rewards" in the figure caption, could you share what normalization method was used?
Also, wondering if you used the same dataset combo as us (helpful-base, helpful-online, harmless-base from hh-rlhf, without red-team data)? https://huggingface.co/datasets/Anthropic/hh-rlhf
Thanks in advance! 🙏
The text was updated successfully, but these errors were encountered: