Skip to content

NLPatVCU/PDF2TXT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PDF2TXT

PDF2TXT can be used to either convert a single .pdf file to a .txt file or all .pdf files in a given directory to .txt files.

alt text

Installation

when in the python 3 virtual environment:

To install PDF2TXT:

git clone https://github.com/NLPatVCU/PDF2TXT.git

You would also need to install the Haystack framework and milvus.

pip3 install pymilvus==1.0.0
pip3 install farm-haystack==1.0.0

If you experience any difficulties, try visiting their site: https://github.com/deepset-ai/haystack

Use

To convert a single file, run:

python3 pdf2txt.py -f <input_file_path>

To convert an entire directory, run:

python3 pdf2txt.py -d <input_directory_path>

To write output files into a specific directory, append with:

-o <output_directory_path>

License

This package is licensed under the GNU General Public License

Acknowledgments

About

Converts a pdf document to text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published