Evaluation functions #94

EdwinB12 · 2024-08-13T16:58:59Z

Very simple application of using an evaluation function in prompto.

review-notebook-app · 2024-08-13T16:59:04Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

EdwinB12 · 2024-08-19T12:22:24Z

This would work better as a method in experiment. The user would run it outside of experiment.process()

EdwinB12 · 2024-08-19T12:29:58Z

Restriction on passed function is it must take in a prompt dictionary and it must return a prompt dictionary

EdwinB12 · 2024-08-19T12:36:17Z

Should support a list/tuple of functions. Don't support arguments. Encourage the user to use the prompt dictionary to parameterise.

EdwinB12 · 2024-08-28T17:17:52Z

This has ended up being a very bare bones application of this and i'm not sure what value it actually adds over just running an evaluation function on the completed responses dictionary saved to disk after called .process().

rchan26

thanks @EdwinB12 - looks great! this will definitely be useful in practice since having to manually post-run an evaluation on responses is not ideal. in the future, this will be a CLI command like the judge one

will merge this and I will add documentation pages for this

codecov-commenter · 2024-08-30T08:14:21Z

Codecov Report

Attention: Patch coverage is 80.00000% with 2 lines in your changes missing coverage. Please review.

Project coverage is 52.41%. Comparing base (cf15ce4) to head (824734a).
Report is 13 commits behind head on main.

Files with missing lines	Patch %	Lines
src/prompto/experiment.py	80.00%	2 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##             main      #94       +/-   ##
===========================================
+ Coverage   35.67%   52.41%   +16.74%     
===========================================
  Files          38       38               
  Lines        1962     1984       +22     
===========================================
+ Hits          700     1040      +340     
+ Misses       1262      944      -318

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Very simple evaluation application

cca8b49

rchan26 linked an issue Aug 19, 2024 that may be closed by this pull request

Add automatic scoring functionality for evaluation #83

Closed

EdwinB12 added 4 commits August 19, 2024 15:00

Moved evaluation_funcs over to evaluation class

89f3ec3

Moved code into generate text and add simple tests

954a1c8

Add example notebook

04befd9

Add notebook example to docs

978600f

EdwinB12 marked this pull request as ready for review August 28, 2024 17:15

EdwinB12 requested a review from rchan26 August 28, 2024 17:16

rchan26 added 3 commits August 30, 2024 09:00

fix typo in notebook

f2e9e5d

add type hint

ebb811c

fix typo in test

824734a

rchan26 approved these changes Aug 30, 2024

View reviewed changes

rchan26 merged commit a05506f into main Aug 30, 2024
6 checks passed

rchan26 deleted the edwinb12-83-auto-scoring-for-eval branch August 30, 2024 08:16

This was referenced Aug 30, 2024

Automatic judge #96

Merged

Add automatic scoring functionality for evaluation #83

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Evaluation functions #94

Evaluation functions #94

EdwinB12 commented Aug 13, 2024

review-notebook-app bot commented Aug 13, 2024

EdwinB12 commented Aug 19, 2024

EdwinB12 commented Aug 19, 2024

EdwinB12 commented Aug 19, 2024

EdwinB12 commented Aug 28, 2024

rchan26 left a comment

codecov-commenter commented Aug 30, 2024

Evaluation functions #94

Evaluation functions #94

Conversation

EdwinB12 commented Aug 13, 2024

review-notebook-app bot commented Aug 13, 2024

EdwinB12 commented Aug 19, 2024

EdwinB12 commented Aug 19, 2024

EdwinB12 commented Aug 19, 2024

EdwinB12 commented Aug 28, 2024

rchan26 left a comment

Choose a reason for hiding this comment

codecov-commenter commented Aug 30, 2024

Codecov Report