This tool can be used to read PDF medical charts and search for pre-defined diagnoses codes and their descriptions. The tool outputs an index file per medical chart, with the page number and line number where probable matches are found. Although the output needs to analyzed by a human, the tool drastically reduces the time required to manually read through charts which could take hours. The output also helps prioritize certain diagnoses codes over others depending on what needs to be analyzed. Further enhancements are possible to curtail output depending on what filters one needs.
cd ~/
mkdir my_dir_for_venvs
cd my_dir_for_virtual_envs
python3 -m venv pdf-charts-analyzer-venv
source ~/my_dir_for_venvs/pdf-charts-analyzer-venv/bin/activate
cd ~/
git clone https://github.com/nsb700/pdf-medical-charts-analyzer.git pdf_charts_analyzer_app
cd ~/pdf_charts_analyzer_app
pip install -r requirements.txt
cd ~/pdf_charts_analyzer_app
mkdir corpus_flow
mkdir corpus_flow/input_pdf_charts
mkdir corpus_flow/stg_00
mkdir corpus_flow/stg_01
mkdir corpus_flow/stg_02
mkdir corpus_flow/stg_03
mkdir corpus_flow/stg_04
mkdir corpus_flow/stg_05
mkdir corpus_flow/stg_06
Place PDF medical charts in pdf_charts_analyzer_app/corpus_flow/input_pdf_charts
For ICD codes which are to be searched, a reference file has already been provided at pdf_charts_analyzer_app/icd_codes/codesfile.csv
One may add/delete icd codes as required as long as same formatting in maintained.
- Run the executable provided at pdf_charts_analyzer_app/main.
- Browse and provide the path to the pdf_charts_analyzer_app/corpus_flow directory
- Browse and provide the path to the pdf_charts_analyzer_app/icd_codes directory
- Click Submit and wait for the tool to finish its processing.
- Output :-
- After processing is finished, final output csv index files are stored in pdf_charts_analyzer_app/corpus_flow/stg_05
- Problematic PDF files which cannot be processed will be saved in pdf_charts_analyzer_app/corpus_flow/stg_06
Below is a small section of a page in the PDF chart -
Below is the text extracted. This is an intermediate output which gets deleted after final output is created.
Below is one row of the final output csv file -