This repository contains a Jupyter notebook that fetches and processes data from a CSV file, computes embeddings for data entries using OpenAI's API, and provides recommendations based on the closest matches (knn) found through these embeddings.
To use this notebook, you'll need to install the following Python libraries:
openai
[Note: You'll need your API key. See Quickstart.]python-dotenv
pandas
numpy
tenacity
pickle
tiktoken
nomic
[Note: Nomic requires an account. See Quickstart.]
You can install these using pip:
pip install openai python-dotenv pandas numpy tenacity pickle tiktoken nomic
You'll need to set the following environment variables:
OPENAI_API_KEY
: Your OpenAI API key.
You can do this by creating a .env
file in the root directory of this project and adding the following line:
OPENAI_API_KEY=your_openai_api_key
Replace your_openai_api_key
with your actual OpenAI API key.
The data is expected to be in a CSV format file named Problem_Intake_CurrentVers_TEST.csv
in the source_data
directory.
The CSV file should have the following columns:
date
: The date of the data entry.need
: The need statement.contact
: The contact information.dept
: The department associated with the need statement.
To use the notebook, simply open it in your Jupyter notebook environment and run the cells sequentially. The notebook will:
- Fetch data from the CSV file.
- Compute the embeddings for each need statement using the OpenAI API.
- Provide recommendations based on the closest matches found through these embeddings.
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
Please make sure to update the tests as appropriate.