This Repo focusses on Data Extraction From PDF and Converting it into Json/Dictionary Form
Python Installed on PC preferably(Python 3.5 or Python 3)
One .pdf data files for example here they are WHandbook.pdf
Need to Install pdfminer3k for python3 by "pip install pdfminer3k"
"pip install pdfminer" and some changes in code are required
Open Command Prompt
cd to the loaction of code and WHandbook.pdf
type "python pdfmine.py"
THE CODE WILL START
The final output in LTTextLine or LTTextBox format for WHandbook.pdf will be in convertedFile.txt
Run this after running pdfmine.py
Open Command Prompt
cd to the loaction of code and convertedFile.txt file
type "python extract.py"
THE CODE WILL START
The final output dictionary will be printed and stored as JSON Format in saerch.txt