Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logprobs to assess the confidence of verdict prediction #1797

Open
Kefan-pauline opened this issue Dec 26, 2024 · 1 comment
Open

logprobs to assess the confidence of verdict prediction #1797

Kefan-pauline opened this issue Dec 26, 2024 · 1 comment
Labels
question Further information is requested

Comments

@Kefan-pauline
Copy link

Kefan-pauline commented Dec 26, 2024

[x] I checked the documentation and related resources and couldn't find an answer to my question.

Your Question

I would like to use logprobs to assess the confidence of verdict predictions in ragas, the implementation will likely be using callbacks. Before implementing it, I would like to know if anyone has already experimented with this and what were your takes on it?

My hope is to reduce the indeterministic nature of the scores from one run to another, by e.g., reducing hallucinations in verdict predictions (i.e. if the confidence is very low for verdict prediction, then rerun the prediction).

@jjmachan
@shahules786

@Kefan-pauline Kefan-pauline added the question Further information is requested label Dec 26, 2024
@jjmachan
Copy link
Member

jjmachan commented Jan 7, 2025

hey @Kefan-pauline that is a very good idea and we would love to work with you and help implement it. We can get on a call and discuss this more if you want

but I think the implementation will be similar to how we have it with https://docs.ragas.io/en/stable/howtos/applications/_cost which could be an inspiration but we can discuss it more

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants