This repo contains two notebooks to help run evaluations of DCAPI's chat functionality.
- PrepareEvaluationData - Takes a list of questions and gets answers from DCAPI. Outputs spreadsheet to be used as input for for the
ScoreAnswers
notebook (or for Azure evaluations). - ScoreAnswers - Takes a spreadsheet of question, answer, and ground_truths produced from
PrepareEvaluationData
and scores the responses using AWS Bedrock.
PrepareEvaluationData
- requires you to obtain a DCAPI authorization token and Setup Environment Variables.ScoreAnswers
requires you to be logged in as either astaging
orproduction
user (login in your terminal before launching your Jupyter notebook)
Python virtual environments can be a great way to bundle a collection of libraries for a specific research area or project and keep it separate from other activities. There are two steps: First, you must create the virtual environment; second, you must install the virtual environment as a Jupyter kernel.
Here are some resources describing how to do this: