Created `complete` tool to allow unsure answers #684

jamesbraza · 2024-11-13T23:10:09Z

Motivation

Part 1

Based on our Environment.step's done condition: https://github.com/Future-House/paper-qa/blob/v5.4.0/paperqa/agents/env.py#L199-L205

We currently (as of v5.4) incorrectly conclude a rollout is done on gen_answer's answers such as:

"Based on the sources provided, it appears no one has done x."
"The sources provide some evidence about gene X, but gene Y is not discussed."

Part 2

We realized that #671 is:

Useful for agents backed by LLM tool_choice="auto" (OpenAI default). An example agent here is LangChain's OpenAIFunctionsAgent
Not useful for agents backed by LLM tool_choice="required", since they cannot specify empty tool calls. This includes our ldp 0.14 Agents or aviary 0.10 ToolSelector

Part 3

In general, empty tool calls signifying done is probably not a generalized assumption.

Implementation

To be generally applicable, here we introduce another tool, the complete tool. When invoked, this tool signifies the rollout is done. This enables:

The unsure answers (see motivation part 1 above) to be non-terminal/intermediary outputs
Simplified PaperQAEnvironment.step such that it doesn't special case done logic for empty tool calls, which in turn enables simplification of the PaperQAEnvironment.reset observations
Trajectories not ending in complete are now clear truncations or failures

Notably this change also simplifies GradablePaperQAEnvironment.step, now we can directly parse the state.session.answer when done, as opposed to parsing the messages.

The tradeoffs here are due to a fifth tool being added, which:

Increases API prompt token usage
Increases the number of moving parts in the system

paperqa/agents/env.py

whitead

Thanks for doing this - I think this is the best approach. Assuming perf looks same, this LGTM!

jamesbraza · 2024-11-15T07:24:44Z

Thanks @whitead for the LGTM, appreciated.

@mskarlin and I were discussing today, and we figured out we can get rid of the "cannot answer" string literal checking in the code base by moving the complete tool to have a bool argument is_sure that basically plays the role of AgentStatus.UNSURE.

In other words, we're planning to move the "unsure" definition from the environment to the agent.

…erateAnswer parsing complexity

jamesbraza · 2024-11-15T21:41:32Z

I decided to resolve the future comments in another PR. Going to merge this one as it's a somewhat atomic change

jamesbraza added the enhancement New feature or request label Nov 13, 2024

jamesbraza requested review from sidnarayanan, mskarlin and nadolskit November 13, 2024 23:10

jamesbraza self-assigned this Nov 13, 2024

dosubot bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Nov 13, 2024

jamesbraza force-pushed the fixing-unsure branch 2 times, most recently from ba3c627 to 44cd66e Compare November 13, 2024 23:16

whitead reviewed Nov 13, 2024

View reviewed changes

paperqa/agents/env.py Outdated Show resolved Hide resolved

jamesbraza force-pushed the fixing-unsure branch from 44cd66e to f381263 Compare November 14, 2024 00:01

jamesbraza mentioned this pull request Nov 14, 2024

Removed tool_names validation for gen_answer being present #685

Merged

jamesbraza requested a review from whitead November 14, 2024 00:13

jamesbraza force-pushed the fixing-unsure branch from f381263 to d4e2221 Compare November 14, 2024 00:15

whitead approved these changes Nov 14, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Nov 14, 2024

jamesbraza force-pushed the fixing-unsure branch 4 times, most recently from 4f7782a to cccdaff Compare November 15, 2024 07:04

jamesbraza added 5 commits November 15, 2024 11:36

Created Complete tool

4501041

Updated prompt and env for the present of a complete tool, with a test

2a35ecc

Now when encountering empty tool calls, continuing onwards

f67f24e

Updated stored configs for new tools

a826e33

Moved GradablePaperQAEnvironment.step to use the answer, removing Gen…

1a975d0

…erateAnswer parsing complexity

jamesbraza force-pushed the fixing-unsure branch from cccdaff to 1a975d0 Compare November 15, 2024 19:38

jamesbraza merged commit 9e8ed75 into main Nov 15, 2024
5 checks passed

jamesbraza deleted the fixing-unsure branch November 15, 2024 21:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Created `complete` tool to allow unsure answers #684

Created `complete` tool to allow unsure answers #684

jamesbraza commented Nov 13, 2024 •

edited

Loading

whitead left a comment

jamesbraza commented Nov 15, 2024

jamesbraza commented Nov 15, 2024

Created complete tool to allow unsure answers #684

Created complete tool to allow unsure answers #684

Conversation

jamesbraza commented Nov 13, 2024 • edited Loading

Motivation

Part 1

Part 2

Part 3

Implementation

whitead left a comment

Choose a reason for hiding this comment

jamesbraza commented Nov 15, 2024

jamesbraza commented Nov 15, 2024

Created `complete` tool to allow unsure answers #684

Created `complete` tool to allow unsure answers #684

jamesbraza commented Nov 13, 2024 •

edited

Loading