Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Created complete tool to allow unsure answers #684

Merged
merged 5 commits into from
Nov 15, 2024
Merged

Conversation

jamesbraza
Copy link
Collaborator

@jamesbraza jamesbraza commented Nov 13, 2024

Motivation

Part 1

Based on our Environment.step's done condition: https://github.com/Future-House/paper-qa/blob/v5.4.0/paperqa/agents/env.py#L199-L205

We currently (as of v5.4) incorrectly conclude a rollout is done on gen_answer's answers such as:

  • "Based on the sources provided, it appears no one has done x."
  • "The sources provide some evidence about gene X, but gene Y is not discussed."

Part 2

We realized that #671 is:

  • Useful for agents backed by LLM tool_choice="auto" (OpenAI default). An example agent here is LangChain's OpenAIFunctionsAgent
  • Not useful for agents backed by LLM tool_choice="required", since they cannot specify empty tool calls. This includes our ldp 0.14 Agents or aviary 0.10 ToolSelector

Part 3

In general, empty tool calls signifying done is probably not a generalized assumption.

Implementation

To be generally applicable, here we introduce another tool, the complete tool. When invoked, this tool signifies the rollout is done. This enables:

  • The unsure answers (see motivation part 1 above) to be non-terminal/intermediary outputs
  • Simplified PaperQAEnvironment.step such that it doesn't special case done logic for empty tool calls, which in turn enables simplification of the PaperQAEnvironment.reset observations
  • Trajectories not ending in complete are now clear truncations or failures

Notably this change also simplifies GradablePaperQAEnvironment.step, now we can directly parse the state.session.answer when done, as opposed to parsing the messages.

The tradeoffs here are due to a fifth tool being added, which:

  • Increases API prompt token usage
  • Increases the number of moving parts in the system

@jamesbraza jamesbraza added the enhancement New feature or request label Nov 13, 2024
@jamesbraza jamesbraza self-assigned this Nov 13, 2024
@dosubot dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 13, 2024
@jamesbraza jamesbraza force-pushed the fixing-unsure branch 2 times, most recently from ba3c627 to 44cd66e Compare November 13, 2024 23:16
paperqa/agents/env.py Outdated Show resolved Hide resolved
Copy link
Collaborator

@whitead whitead left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing this - I think this is the best approach. Assuming perf looks same, this LGTM!

@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 14, 2024
@jamesbraza jamesbraza force-pushed the fixing-unsure branch 4 times, most recently from 4f7782a to cccdaff Compare November 15, 2024 07:04
@jamesbraza
Copy link
Collaborator Author

Thanks @whitead for the LGTM, appreciated.

@mskarlin and I were discussing today, and we figured out we can get rid of the "cannot answer" string literal checking in the code base by moving the complete tool to have a bool argument is_sure that basically plays the role of AgentStatus.UNSURE.

In other words, we're planning to move the "unsure" definition from the environment to the agent.

@jamesbraza
Copy link
Collaborator Author

I decided to resolve the future comments in another PR. Going to merge this one as it's a somewhat atomic change

@jamesbraza jamesbraza merged commit 9e8ed75 into main Nov 15, 2024
5 checks passed
@jamesbraza jamesbraza deleted the fixing-unsure branch November 15, 2024 21:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants