Replies: 2 comments
-
I interpret this as being dependent on using a real interactive model rather than the data creation tasks. This would require us logging the conversation flows between users and a live model and then letting someone go back through their interaction logs and evaluating results. I think it's a pretty good idea but is blocked on an interactive model. |
Beta Was this translation helpful? Give feedback.
-
I don't know enough about this to understand the answer, but I've had another idea regarding this. If the fine tuning process is not too expensive there could be multiple versions like ChatGPT is doing. That way people can start using the assistant with the points they gain by collaborating. While this approach would be more expensive due to the need for multiple fine-tuning processes, it would likely result in more engagement and feedback from users. |
Beta Was this translation helpful? Give feedback.
-
I would like to use the assistant and if I don't like the answer it gives be able to come back later to the same answer if I find a better answer and provide the new answer as feedback. And also rate the assistant's answer with quality, helpfulness, etc.
Right now replying as the assistant or classifying the assistant reply is very difficult because it's talking about topics I don't know about.
I don't know if RLHF is a process or you have to have all the data collected before using it for training. But if it's a process I think this is best and there would also be more people interested in contributing if they could use the assistant.
Beta Was this translation helpful? Give feedback.
All reactions