-
Notifications
You must be signed in to change notification settings - Fork 609
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Created complete
tool to allow unsure answers
#684
Conversation
ba3c627
to
44cd66e
Compare
44cd66e
to
f381263
Compare
f381263
to
d4e2221
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for doing this - I think this is the best approach. Assuming perf looks same, this LGTM!
4f7782a
to
cccdaff
Compare
Thanks @whitead for the LGTM, appreciated. @mskarlin and I were discussing today, and we figured out we can get rid of the In other words, we're planning to move the "unsure" definition from the environment to the agent. |
…erateAnswer parsing complexity
cccdaff
to
1a975d0
Compare
I decided to resolve the future comments in another PR. Going to merge this one as it's a somewhat atomic change |
Motivation
Part 1
Based on our
Environment.step
'sdone
condition: https://github.com/Future-House/paper-qa/blob/v5.4.0/paperqa/agents/env.py#L199-L205We currently (as of v5.4) incorrectly conclude a rollout is
done
ongen_answer
'sanswer
s such as:Part 2
We realized that #671 is:
tool_choice="auto"
(OpenAI default). An example agent here is LangChain'sOpenAIFunctionsAgent
tool_choice="required"
, since they cannot specify empty tool calls. This includes ourldp
0.14Agent
s oraviary
0.10ToolSelector
Part 3
In general, empty tool calls signifying
done
is probably not a generalized assumption.Implementation
To be generally applicable, here we introduce another tool, the
complete
tool. When invoked, this tool signifies the rollout isdone
. This enables:PaperQAEnvironment.step
such that it doesn't special casedone
logic for empty tool calls, which in turn enables simplification of thePaperQAEnvironment.reset
observationscomplete
are now clear truncations or failuresNotably this change also simplifies
GradablePaperQAEnvironment.step
, now we can directly parse thestate.session.answer
whendone
, as opposed to parsing themessages
.The tradeoffs here are due to a fifth tool being added, which: