Skip to content

Commit

Permalink
chore: Updated requirements, README and fixed incompatibilities.
Browse files Browse the repository at this point in the history
  • Loading branch information
anirbanbasu committed Sep 18, 2024
1 parent 5376de6 commit 2ef751b
Show file tree
Hide file tree
Showing 6 changed files with 111 additions and 113 deletions.
6 changes: 0 additions & 6 deletions .env.docker
Original file line number Diff line number Diff line change
Expand Up @@ -49,11 +49,5 @@ LLM__TOP_K = "40"
LLM__REPEAT_PENALTY = "1.1"
LLM__SEED = "1"

# Vector storage
# Qdrant URL assuming that it is on the Docker host
VECTORDB__QDRANT_URL = "http://host.docker.internal:6333"
# Only required if you are using Qdrant on the cloud
VECTORDB__QDRANT_API_KEY = "your-qdrant-api-key"

# Tavily
TAVILY_API_KEY = "your-tavily-api-key"
78 changes: 43 additions & 35 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@

## Overview

The DQA aka _difficult questions attempted_ project aims to make large language models attempt difficult questions through an agent-based architecture. The project utilises agents and tools. This project is inspired by a tutorial [^1] from [Dean Sacoransky](https://www.linkedin.com/in/dean-sacoransky-6a671119a/). Unlike the tutorial's use of the [LangGraph framework from LangChain](https://langchain-ai.github.io/langgraph/) for building agents, this project makes use of [LlamaIndex Workflows](https://docs.llamaindex.ai/en/stable/module_guides/workflow/).
The DQA aka _difficult questions attempted_ project utilises one or more agents to facilitate large language models (LLMs) perform _multi-hop question answering_ (MHQA). This project is inspired by a tutorial [^1] from [Dean Sacoransky](https://www.linkedin.com/in/dean-sacoransky-6a671119a/). Unlike the tutorial's use of the [LangGraph framework from LangChain](https://langchain-ai.github.io/langgraph/) for building agents, this project makes use of [LlamaIndex Workflows](https://docs.llamaindex.ai/en/stable/module_guides/workflow/).

### An example of a difficult question and LLM responses
The tutorial uses the question _Which David Fincher film that stars Edward Norton does not star Brad Pitt?_ as a litmus test for assessing new AI systems. The answer is supposed to be _None_, but at the time of writing the tutorial (26 August 2024), the author states that ChatGPT's `gpt-4o` model generates the following response.
Expand All @@ -25,25 +25,29 @@ This project implements an agent-based framework akin to the one mentioned in th

### A simpler difficult question
Let's focus on a slightly simpler test question that nonetheless baffles ChatGPT `gpt-4o`. Let's ask _Tell me the result of multiplying the number of 'r's in the word 'strawberry' with the sum of 3 and 4. Explain the process._ ChatGPT hopelessly responds to this with the following.
> Let's break down the problem step by step:
>
>Find the number of 'r's in the word "strawberry": The word "strawberry" contains 2 'r's.
>
> Find the sum of 3 and 4: $3+4=7$.
>
> Multiply the number of 'r's by the sum: $2 \times 7=14$.
>
> So, the result of multiplying the number of 'r's in "strawberry" by the sum of 3 and 4 is 14.

<div style="background-color: #707070; padding: 0.25rem; margin: 0.75rem">
Let's break down the problem step by step:<br/><br/>
Find the number of 'r's in the word "strawberry": The word "strawberry" contains 2 'r's.<br/><br/>
Find the sum of 3 and 4: $3+4=7$.

Multiply the number of 'r's by the sum: $2 \times 7=14$.

So, the result of multiplying the number of 'r's in "strawberry" by the sum of 3 and 4 is 14.
</div>

While ChatGPT did not make mistakes with the basic arithmetic operations, it counted the number of 'r's in the word 'strawberry' wrong. In contrast, the DQA workflow gets it right with the following answer, as of September 6, 2024, surprisingly using the `gpt-4o-mini` model.
>Final Answer: The result of multiplying the number of 'r's in the word 'strawberry' (which is 3) with the sum of 3 and 4 (which is 7) is 21.
>
> Detailed Explanation: To arrive at the final answer, we first determined the number of 'r's in the word 'strawberry'. The analysis revealed that there are 3 'r's in the word. Next, we calculated the sum of 3 and 4, which is 7. Finally, we multiplied these two results together: 3 (the number of 'r's) multiplied by 7 (the sum of 3 and 4) equals 21. Therefore, the final result is 21.

<div style="background-color: #707070; padding: 0.25rem; margin: 0.75rem">
Final Answer: The result of multiplying the number of 'r's in the word 'strawberry' (which is 3) with the sum of 3 and 4 (which is 7) is 21.<br/><br/>

Detailed Explanation: To arrive at the final answer, we first determined the number of 'r's in the word 'strawberry'. The analysis revealed that there are 3 'r's in the word. Next, we calculated the sum of 3 and 4, which is 7. Finally, we multiplied these two results together: 3 (the number of 'r's) multiplied by 7 (the sum of 3 and 4) equals 21. Therefore, the final result is 21.
</div>

The reason the `gpt-4o-mini` model is able to count the number of 'r's correctly is because DQA lets it use a function to calculate the occurrences of a specific character or a sequence of characters in a string.

### The agent workflow
The approximate workflow for DQA can be summarised as follows.
The approximate current workflow for DQA can be summarised as follows.
![Workflow](./diagrams/workflow.svg)

The DQA workflow uses a [self-discover](https://arxiv.org/abs/2402.03620) "agent" to produce a reasoning structure but not answer the question. Similar to the tutorial [^1], the DQA workflow performs query decomposition with respect to the reasoning structure to ensure that complex queries are not directly sent to the LLM. Instead, sub-questions (i.e., decompositions of the complex query) that help answer the complex query are sent. The workflow further optimises the sub-questions through a query refinement step, which loops if necessary, for a maximum number of allowed iterations.
Expand All @@ -54,22 +58,25 @@ When all ReAct workflows have finished, the final step for answer generation col

### Response to the initial difficult question
Recalling the litmus test question (i.e., _Which David Fincher film that stars Edward Norton does not star Brad Pitt?_), the response from DQA with `gpt-4o-mini` is correct, as in the answer is _none_, but the response is long-winded.
> The David Fincher film that stars Edward Norton but does not feature Brad Pitt is **none**. The only film directed by David Fincher that includes both Edward Norton and Brad Pitt is Fight Club (1999). In this film, Edward Norton plays the unnamed narrator, while Brad Pitt portrays Tyler Durden. Therefore, there are no David Fincher films starring Edward Norton that exclude Brad Pitt.
>
> To summarize:
>
> - Film featuring both Edward Norton and Brad Pitt: Fight Club (1999)
> - Other films directed by David Fincher include:
> - Alien 3 (1992)
> - Se7en (1995)
> - The Game (1997)
> - Panic Room (2002)
> - Zodiac (2007)
> - The Curious Case of Benjamin Button (2008)
> - The Social Network (2010)
> - The Girl with the Dragon Tattoo (2011)
> - Gone Girl (2014)
> - Mank (2020)

<div style="background-color: #707070; padding: 0.25rem; margin: 0.75rem">
The David Fincher film that stars Edward Norton but does not feature Brad Pitt is **none**. The only film directed by David Fincher that includes both Edward Norton and Brad Pitt is Fight Club (1999). In this film, Edward Norton plays the unnamed narrator, while Brad Pitt portrays Tyler Durden. Therefore, there are no David Fincher films starring Edward Norton that exclude Brad Pitt.<br/><br/>

To summarize:

- Film featuring both Edward Norton and Brad Pitt: Fight Club (1999)
- Other films directed by David Fincher include:
- Alien 3 (1992)
- Se7en (1995)
- The Game (1997)
- Panic Room (2002)
- Zodiac (2007)
- The Curious Case of Benjamin Button (2008)
- The Social Network (2010)
- The Girl with the Dragon Tattoo (2011)
- Gone Girl (2014)
- Mank (2020)
</div>

### Inconsistency and the need for improvement
The generated responses depend heavily on the LLM making them very inconsistent. In addition, while the workflow passes on the examples shown here, there remains room for improvement, with respect to wasteful LLM calls, wasteful tool calls, consistency of the answer from the same LLM, ability to generate reliable answers from low parameter quantised models (available on Ollama, for instance), amongst others.
Expand Down Expand Up @@ -112,7 +119,7 @@ If necessary, you can uninstall everything previously installed by `pip` (in the
python -m pip freeze | cut -d "@" -f1 | xargs pip uninstall -y
```

In addition to Python dependencies, see the installation instructions of [Ollama](https://ollama.com/download) and that of [Qdrant](https://qdrant.tech/documentation/guides/installation/). You can install either of these on separate machines. Download the [tool calling model of Ollama](https://ollama.com/search?c=tools) that you want to use, e.g., `llama3.1` or `mistral-nemo`.
In addition to Python dependencies, see the installation instructions of [Ollama](https://ollama.com/download). You can install it on a separate machine. Download the [tool calling model of Ollama](https://ollama.com/search?c=tools) that you want to use, e.g., `llama3.1` or `mistral-nemo`.

## Usage (local)

Expand All @@ -122,11 +129,12 @@ Make a copy of the file `.env.docker` in the _working directory_ as a `.env` fil
cp .env.docker .env
```

Change all occurrences of `host.docker.internal` to `localhost` or some other host or IP assuming that you have both Ollama and Qdrant available at ports 11434 and 6333, respectively, on your preferred host. Set the Ollama model to the tool calling model that you have downloaded on your Ollama installation. Set the value of the `LLM_PROVIDER` to the provider that you want to use. Supported names are `Anthropic`, `Cohere`, `Groq`, `Ollama` and `Open AI`.
Change all occurrences of `host.docker.internal` to `localhost` or some other host or IP assuming that you have Ollama on port 11434 on your preferred host. Set the Ollama model to the tool calling model that you have downloaded on your Ollama installation. Set the value of the `LLM_PROVIDER` to the provider that you want to use. Supported names are `Anthropic`, `Cohere`, `Groq`, `Ollama` and `Open AI`.

You can use the environment variable `SUPPORTED_LLM_PROVIDERS` to further restrict the supported LLM providers to a subset of the aforementioned, such as, by setting the value to `Groq:Ollama` to allow only Groq and Ollama for some deployment of this app. Note that the only separating character between LLM provider names is a `:`. If you add a provider that is not in the aforementioned set, the app will throw an error and refuse to start.

Add the API keys for [Anthropic](https://console.anthropic.com/), [Cohere](https://dashboard.cohere.com/welcome/login), [Groq](https://console.groq.com/keys) or [Open AI](https://platform.openai.com/docs/overview) if you want to use any of these. In addition, add [an API key of Tavily](https://app.tavily.com/sign-in). Qdrant API key is not necessary if you are not using [Qdrant cloud](https://qdrant.tech/documentation/qdrant-cloud-api/).
Add the API keys for [Anthropic](https://console.anthropic.com/), [Cohere](https://dashboard.cohere.com/welcome/login), [Groq](https://console.groq.com/keys) or [Open AI](https://platform.openai.com/docs/overview) if you want to use any of these. In addition, add [an API key of Tavily](https://app.tavily.com/sign-in).
<!-- Qdrant API key is not necessary if you are not using [Qdrant cloud](https://qdrant.tech/documentation/qdrant-cloud-api/). -->

With all these setup done, run the following to start the web server. The web server will serve a web user interface as well as a REST API. It is not configured to use HTTPS.

Expand All @@ -138,9 +146,9 @@ The web UI will be available at [http://localhost:7860](http://localhost:7860).

## Usage (Docker)

In the `.env.docker`, both Ollama and Qdrant are expected to be available at ports 11434 and 6333, respectively, on your Docker host, i.e., `host.docker.internal`. Set them to some other host(s), if that is where your Ollama and Qdrant servers are available. Set the Ollama model to the tool calling model that you have downloaded on your Ollama installation.
In the `.env.docker`, Ollama is expected to be available on port 11434 on your Docker host, i.e., `host.docker.internal`. Set that to some other host(s), if that is where your Ollama server is available. Set the Ollama model to the tool calling model that you have downloaded on your Ollama installation.

Set the value of the `LLM_PROVIDER` to the provider that you want to use and add the API keys for Anthropic, Cohere, Groq and Open AI LLM providers as well as that of Tavily and optionally Qdrant as metioned above in the **Usage (local)** section.
Set the value of the `LLM_PROVIDER` to the provider that you want to use and add the API keys for Anthropic, Cohere, Groq and Open AI LLM providers as well as that of Tavily as metioned above in the **Usage (local)** section.

With all these setup done, and assuming that you have Docker installed, you can build an image of the DQA app, create a container and start it as follows.

Expand Down
99 changes: 45 additions & 54 deletions requirements-frozen.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,14 +9,14 @@ arxiv==2.1.3
asttokens==2.4.1
attrs==24.2.0
beautifulsoup4==4.12.3
boto3==1.35.14
botocore==1.35.14
boto3==1.35.22
botocore==1.35.22
cachetools==5.5.0
certifi==2024.8.30
cfgv==3.4.0
charset-normalizer==3.3.2
click==8.1.7
cohere==5.9.1
cohere==5.9.2
colorama==0.4.6
contourpy==1.3.0
cycler==0.12.1
Expand All @@ -26,36 +26,31 @@ Deprecated==1.2.14
dirtyjson==1.0.8
distlib==0.3.8
distro==1.9.0
duckduckgo_search==6.2.11
duckduckgo_search==6.2.12
executing==2.1.0
fastapi==0.112.4
fastapi==0.115.0
fastavro==1.9.7
feedparser==6.0.11
ffmpy==0.4.0
filelock==3.15.4
filelock==3.16.1
fonttools==4.53.1
frozendict==2.4.4
frozenlist==1.4.1
fsspec==2024.9.0
google-auth==2.34.0
gradio==4.43.0
gradio==4.44.0
gradio_client==1.3.0
greenlet==3.0.3
grpcio==1.66.1
grpcio-tools==1.66.1
greenlet==3.1.0
h11==0.14.0
h2==4.1.0
hpack==4.0.0
html5lib==1.1
httpcore==1.0.5
httpx==0.27.2
httpx-sse==0.4.0
huggingface-hub==0.24.6
hyperframe==6.0.1
huggingface-hub==0.25.0
icecream==2.1.3
identify==2.6.0
idna==3.8
importlib_resources==6.4.4
identify==2.6.1
idna==3.10
importlib_resources==6.4.5
iniconfig==2.0.0
ipython==8.27.0
jedi==0.19.1
Expand All @@ -66,20 +61,20 @@ joblib==1.4.2
jsonpickle==3.3.0
kiwisolver==1.4.7
llama-cloud==0.0.17
llama-index==0.11.7
llama-index-agent-openai==0.3.1
llama-index-cli==0.3.0
llama-index-core==0.11.7
llama-index-embeddings-openai==0.2.4
llama-index-indices-managed-llama-cloud==0.3.0
llama-index==0.11.10
llama-index-agent-openai==0.3.2
llama-index-cli==0.3.1
llama-index-core==0.11.10
llama-index-embeddings-openai==0.2.5
llama-index-indices-managed-llama-cloud==0.3.1
llama-index-legacy==0.9.48.post3
llama-index-llms-anthropic==0.3.0
llama-index-llms-anthropic==0.3.1
llama-index-llms-cohere==0.3.0
llama-index-llms-groq==0.2.0
llama-index-llms-ollama==0.3.1
llama-index-llms-openai==0.2.3
llama-index-llms-ollama==0.3.2
llama-index-llms-openai==0.2.8
llama-index-llms-openai-like==0.2.0
llama-index-multi-modal-llms-openai==0.2.0
llama-index-multi-modal-llms-openai==0.2.1
llama-index-program-openai==0.2.0
llama-index-question-gen-openai==0.2.0
llama-index-readers-file==0.2.1
Expand All @@ -89,26 +84,26 @@ llama-index-tools-duckduckgo==0.2.1
llama-index-tools-tavily-research==0.2.0
llama-index-tools-wikipedia==0.2.0
llama-index-tools-yahoo-finance==0.2.0
llama-index-utils-openai==0.1.0
llama-index-utils-workflow==0.2.1
llama-index-vector-stores-qdrant==0.3.0
llama-parse==0.5.2
llama-parse==0.5.5
lxml==5.3.0
markdown-it-py==3.0.0
MarkupSafe==2.1.5
marshmallow==3.22.0
matplotlib==3.9.2
matplotlib-inline==0.1.7
mdurl==0.1.2
multidict==6.0.5
multidict==6.1.0
multitasking==0.0.11
mypy-extensions==1.0.0
nest-asyncio==1.6.0
networkx==3.3
nltk==3.9.1
nodeenv==1.9.1
numpy==1.26.4
ollama==0.3.2
openai==1.44.0
ollama==0.3.3
openai==1.46.0
orjson==3.10.7
packaging==24.1
pandas==2.2.2
Expand All @@ -117,50 +112,46 @@ parso==0.8.4
peewee==3.17.6
pexpect==4.9.0
pillow==10.4.0
platformdirs==4.2.2
platformdirs==4.3.6
pluggy==1.5.0
portalocker==2.10.1
pre-commit==3.8.0
primp==0.6.1
primp==0.6.2
prompt_toolkit==3.0.47
protobuf==5.28.0
ptyprocess==0.7.0
pure_eval==0.2.3
pyasn1==0.6.0
pyasn1_modules==0.4.0
pydantic==2.9.0
pydantic_core==2.23.2
pyasn1==0.6.1
pyasn1_modules==0.4.1
pydantic==2.9.2
pydantic_core==2.23.4
pydub==0.25.1
Pygments==2.18.0
pyparsing==3.1.4
pypdf==4.3.1
pytest==8.3.2
pytest==8.3.3
python-dateutil==2.9.0.post0
python-dotenv==1.0.1
python-multipart==0.0.9
pytz==2024.1
pytz==2024.2
pyvis==0.3.2
PyYAML==6.0.2
qdrant-client==1.11.1
regex==2024.7.24
regex==2024.9.11
requests==2.32.3
rich==13.8.0
rich==13.8.1
rsa==4.9
ruff==0.6.4
ruff==0.6.5
s3transfer==0.10.2
safetensors==0.4.5
semantic-version==2.10.0
setuptools==74.1.2
sgmllib3k==1.0.0
shellingham==1.5.4
six==1.16.0
sniffio==1.3.1
soupsieve==2.6
SQLAlchemy==2.0.34
SQLAlchemy==2.0.35
stack-data==0.6.3
starlette==0.38.4
starlette==0.38.5
striprtf==0.0.26
tavily-python==0.4.0
tavily-python==0.5.0
tenacity==8.5.0
tiktoken==0.7.0
tokenizers==0.19.1
Expand All @@ -169,17 +160,17 @@ tqdm==4.66.5
traitlets==5.14.3
transformers==4.44.2
typer==0.12.5
types-requests==2.32.0.20240907
types-requests==2.32.0.20240914
typing-inspect==0.9.0
typing_extensions==4.12.2
tzdata==2024.1
urllib3==2.2.2
urllib3==2.2.3
uvicorn==0.30.6
virtualenv==20.26.3
virtualenv==20.26.5
wcwidth==0.2.13
webencodings==0.5.1
websockets==12.0
wikipedia==1.4.0
wrapt==1.16.0
yarl==1.10.0
yarl==1.11.1
yfinance==0.2.43
4 changes: 2 additions & 2 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -12,8 +12,8 @@ llama-index
llama-index-utils-workflow

# LlamaIndex storage
llama-index-vector-stores-qdrant
qdrant_client
# llama-index-vector-stores-qdrant
# qdrant_client

# LlamaIndex LLMs
llama-index-llms-groq
Expand Down
Loading

0 comments on commit 2ef751b

Please sign in to comment.