Skip to content

Commit

Permalink
Merge pull request #124 from ibm-client-engineering/shirley-dev-2
Browse files Browse the repository at this point in the history
Minor updates on preprocessing instruction
  • Loading branch information
Sahaj-IBM authored Jun 7, 2024
2 parents 19bbce3 + b41bd35 commit e0de608
Show file tree
Hide file tree
Showing 3 changed files with 3 additions and 2 deletions.
2 changes: 1 addition & 1 deletion docs/3-Use-Cases/NeuralSeek.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -96,7 +96,7 @@ In addition to testing on NeuralSeek, we have written a script to allow testing
We performed Pre-Processing and No OCR, No Pre-Processing and No OCR, and OCR experiments using the testing notebook.
You can and run the different experiments just by changing the Discovery collection ID and providing with the questions and expected responses as string arrays.
It uses the NeuralSeek API.
Please refer to [Testing Notebook](Tables%20Testing.ipynb) for detailed steps.
Please refer to [Testing Notebook](testing.ipynb) for detailed steps.

## Download Logs
- Proceed to API on Integrate tab
Expand Down
3 changes: 2 additions & 1 deletion docs/3-Use-Cases/Watson Discovery.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,8 @@ custom_edit_url: null
# Data Preprocessing

- Data containing tables needs to be pre-processed so that the LLM can properly read the content in tables.
- Before uploading to watson discovery, run the following script on your files and upload the files generated by the output to Watson Discovery: [link](preprocess_file.ipynb)
- Before uploading to Watson Discovery, run the following script on your files if needed and upload the files generated by the output to Watson Discovery: [link](preprocess_file.ipynb). Additional changes should be applied to this script in order to make it customized for your PDF documents. The provided script is for a specific set of documents that we used.
- The script interates through each page of the PDF file, finds all the tables, and transforms each table into natural language format utilizing LLM. Having tables in a natural language format will help with question and answering. The code will preprocess PDFs and output HTML files.

# Create Project and Collection

Expand Down
File renamed without changes.

0 comments on commit e0de608

Please sign in to comment.