Extractmolecule

This Repo focusses on Data Extraction From PDF and Converting it into Json/Dictionary Form

Requirements

NOTE (Your files and code should be in the same folder or place)

Python Installed on PC preferably(Python 3.5 or Python 3)

One .pdf data files for example here they are WHandbook.pdf

Need to Install pdfminer3k for python3 by "pip install pdfminer3k"

For Python 2.7

"pip install pdfminer" and some changes in code are required

For running pdfmine.py (Data extraction)

Open Command Prompt

cd to the loaction of code and WHandbook.pdf

type "python pdfmine.py"

THE CODE WILL START

The final output in LTTextLine or LTTextBox format for WHandbook.pdf will be in convertedFile.txt

For running extract.py (Data Conversion)

Run this after running pdfmine.py

Open Command Prompt

cd to the loaction of code and convertedFile.txt file

type "python extract.py"

THE CODE WILL START

The final output dictionary will be printed and stored as JSON Format in saerch.txt

newextract.py considers different kind of dividing between compounds. For Better ouput cltr+H and replace all . with spaces

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
README.md		README.md
converted.txt		converted.txt
convertedFile.txt		convertedFile.txt
extract.py		extract.py
newextract.py		newextract.py
pdfmine.py		pdfmine.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Extractmolecule

Requirements

NOTE (Your files and code should be in the same folder or place)

For Python 2.7

For running pdfmine.py (Data extraction)

For running extract.py (Data Conversion)

newextract.py considers different kind of dividing between compounds. For Better ouput cltr+H and replace all . with spaces

About

Releases

Packages

Contributors 2

Languages

MolecularOdorRecognition/Extractmolecule

Folders and files

Latest commit

History

Repository files navigation

Extractmolecule

Requirements

NOTE (Your files and code should be in the same folder or place)

For Python 2.7

For running pdfmine.py (Data extraction)

For running extract.py (Data Conversion)

newextract.py considers different kind of dividing between compounds. For Better ouput cltr+H and replace all . with spaces

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages