Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Incorporate llamaindex search in the accuracy testing #16

Merged
Merged
Show file tree
Hide file tree
Changes from 17 commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
fb662ee
issue #6: abenassi Google-Search-API
ibrahim-kabir Mar 22, 2024
1e84342
issue #6: Nv7-GitHub googlesearch
ibrahim-kabir Mar 22, 2024
c463607
issue #6: bing search by scrapping
ibrahim-kabir Mar 22, 2024
84f42ba
issue #6: google_search completed
ibrahim-kabir Mar 22, 2024
d3849a2
issue #7: Removed punction mark on md files, added total number of 0,…
ibrahim-kabir Mar 21, 2024
bf895a3
issue #6: google_search completed
ibrahim-kabir Mar 22, 2024
232208a
issue #6: google api incorporation
ibrahim-kabir Mar 22, 2024
17a6c80
issue #6: Bing Search works
ibrahim-kabir Mar 25, 2024
18f1df2
issue #6: Refactoring + Bing Search + Bing Filtered Search
ibrahim-kabir Mar 26, 2024
885f447
issue #6: Refactored + Fix markdown issue on links + Add parsing scri…
ibrahim-kabir Mar 27, 2024
7131cc5
issue #6: typos
ibrahim-kabir Apr 2, 2024
e7fe32f
issue #6: typo
ibrahim-kabir Apr 2, 2024
4b3c473
issue #14: add llamaindex search
ibrahim-kabir Apr 3, 2024
9a5838b
issue #14: add top results table
ibrahim-kabir Apr 3, 2024
8f3e312
issue #14: FINESSE_USAGE new instructions
ibrahim-kabir Apr 3, 2024
bb68ba0
issue #14: update cache_path
ibrahim-kabir Apr 3, 2024
67bb3d0
issue #14: update tests
ibrahim-kabir Apr 3, 2024
d63b6b2
issue #14: if name is main
ibrahim-kabir Apr 4, 2024
0356f49
issue #14: add new line on json file
ibrahim-kabir Apr 4, 2024
78184c3
issue #14: sort imports + output folder in env
ibrahim-kabir Apr 4, 2024
3d883f8
issue #14: no need of on_start func
ibrahim-kabir Apr 5, 2024
6d14fd7
Merge branch 'main' into 14-incorporate-llamaindex-search-in-the-accu…
ibrahim-kabir Apr 11, 2024
b195592
issue #14: fix checks fails + added space after comma
ibrahim-kabir Apr 11, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .env.template
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
BING_SEARCH_KEY =
BING_ENDPOINT =
CACHE_PATH =
12 changes: 9 additions & 3 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,13 @@ keys/
flask_session/

# Ignore local QnA json files
QnA
QnA/

# Ignore output of api-test
output
# Ignore output of api-test and from the scripts
output/

# Ignore input of the scripts
input/

# Ignore the generated files from cache
cache/
86 changes: 81 additions & 5 deletions finesse/FINESSE_USAGE.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,37 @@
# How to use the Finesse Locust script

This tool simplifies the process of comparing different search engines and
assessing their accuracy. It's designed to be straightforward, making it easy
to understand and use.
assessing their accuracy. It's designed to be straightforward, making it easy to
understand and use.

## Configuration

Before using the Finesse Locust script, make sure to set up the necessary
configuration. Follow the steps below:

1. Create a `.env` file in the root directory of the project if it doesn't
already exist.

2. Copy the contents of the `.env.template` file and paste them into the `.env`
file.

3. Replace the placeholder values in the `.env` file with your actual secrets
and configuration settings. In particular, you will need to provide the
necessary credentials for the Bing Search API.

## Caching

Finesse supports caching to improve performance and reduce costs. If you already
have a cache directory from a previous usage, you can reuse it by placing it in
the `finesse` directory. If you don't have a cache directory, Finesse will
automatically create one for you.

The cache directory is used to store expensive API requests, so they don't need
to be repeated unnecessarily. This can significantly speed up subsequent runs of
the Finesse Locust script.

Make sure that the cache directory has the appropriate read and write
permissions for the user running the script.

## How it Works

Expand All @@ -16,8 +45,8 @@ to understand and use.
- `static`: Static search engine
- `llamaindex`: LlamaIndex search engine
- `--path [directory path]`: Point to the directory with files structured
- `--host [API URL]`: Point to the finesse-backend URL
with JSON files with the following properties:
- `--host [API URL]`: Point to the finesse-backend URL with JSON files with
the following properties:
- `score`: The score of the page.
- `crawl_id`: The unique identifier associated with the crawl table.
- `chunk_id`: The unique identifier of the chunk.
Expand All @@ -43,7 +72,8 @@ to understand and use.
- **Round trip time**
- Measure round trip time of each request
- **Summary statistical value**
- Measure the average, median, standard deviation, minimum and maximal accuracy scores and round trip time
- Measure the average, median, standard deviation, minimum and maximal
accuracy scores and round trip time

## Diagram

Expand Down Expand Up @@ -100,3 +130,49 @@ Accuracy statistical summary:

This example shows how the CLI Output of the tool, analyzing search results from
Azure Search and providing an accuracy score for Finesse.

## Scripts

### XLSX Converter to JSON 📄

This script converts data from an Excel file (.xlsx) into JSON format. It is
used for questions generated created by non-developers. Excel files are easier
to read than JSON files.

### Usage

1. **Input Excel File**: Place the Excel file containing the data in the
specified input folder (`--input-folder`). By default, the input folder is
set to `'finesse/scripts/input/'`.

2. **Output Folder**: Specify the folder where the resulting JSON files will be
saved using the `--output-folder` argument. By default, the output folder is
set to `'finesse/scripts/output/'`.

3. **Input File Name**: Provide the name of the input Excel file using the
`--file-name` argument..

4. **Worksheet Name**: Specify the name of the worksheet containing the data
using the `--sheet-name` argument. By default, it is set to `'To fill'`.

### Example Command

```bash
python finesse/scripts/xlsx_converter_json.py --input-folder finesse/scripts/input/ --output-folder finesse/scripts/output/ --file-name Finesse_questions_for_testing.xlsx --sheet-name "To fill"
```

Replace `'example.xlsx'` with the actual name of your input Excel file and
`'Sheet1'` with the name of the worksheet containing the data.

### Output

The script generates individual JSON files for each row of data in the specified
output folder. Each JSON file contains the following fields:

- `question`: The question extracted from the Excel file.
- `answer`: The answer extracted from the Excel file.
- `title`: The title(s) extracted from specified columns in the Excel file.
- `url`: The URL(s) extracted from specified columns in the Excel file.

Upon completion, the script prints "Conversion terminée !" (Conversion
completed!) to indicate that the conversion process is finished.
Loading
Loading