Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Evaluation functions #94

Merged
merged 8 commits into from
Aug 30, 2024
Merged

Evaluation functions #94

merged 8 commits into from
Aug 30, 2024

Conversation

EdwinB12
Copy link
Collaborator

Very simple application of using an evaluation function in prompto.

Copy link

Check out this pull request on  ReviewNB

See visual diffs & provide feedback on Jupyter Notebooks.


Powered by ReviewNB

@rchan26 rchan26 linked an issue Aug 19, 2024 that may be closed by this pull request
@EdwinB12
Copy link
Collaborator Author

This would work better as a method in experiment. The user would run it outside of experiment.process()

@EdwinB12
Copy link
Collaborator Author

Restriction on passed function is it must take in a prompt dictionary and it must return a prompt dictionary

@EdwinB12
Copy link
Collaborator Author

Should support a list/tuple of functions. Don't support arguments. Encourage the user to use the prompt dictionary to parameterise.

@EdwinB12 EdwinB12 marked this pull request as ready for review August 28, 2024 17:15
@EdwinB12 EdwinB12 requested a review from rchan26 August 28, 2024 17:16
@EdwinB12
Copy link
Collaborator Author

This has ended up being a very bare bones application of this and i'm not sure what value it actually adds over just running an evaluation function on the completed responses dictionary saved to disk after called .process().

Copy link
Collaborator

@rchan26 rchan26 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @EdwinB12 - looks great! this will definitely be useful in practice since having to manually post-run an evaluation on responses is not ideal. in the future, this will be a CLI command like the judge one

will merge this and I will add documentation pages for this

@codecov-commenter
Copy link

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 52.41%. Comparing base (cf15ce4) to head (824734a).
Report is 13 commits behind head on main.

Files with missing lines Patch % Lines
src/prompto/experiment.py 80.00% 2 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main      #94       +/-   ##
===========================================
+ Coverage   35.67%   52.41%   +16.74%     
===========================================
  Files          38       38               
  Lines        1962     1984       +22     
===========================================
+ Hits          700     1040      +340     
+ Misses       1262      944      -318     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@rchan26 rchan26 merged commit a05506f into main Aug 30, 2024
6 checks passed
@rchan26 rchan26 deleted the edwinb12-83-auto-scoring-for-eval branch August 30, 2024 08:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add automatic scoring functionality for evaluation
3 participants