Data-Scraper

A data-scraper that makes it possible to filter out the most important information from huge amounts of text based data.

The script asks for a keyword to search for. It compares the keyword with the file-name and its contents. As soon as it finds the keyword in it, it is listed as a match and output at the end.

File Content Read

The scraper is able to read only the following text-based files:

.docx
.pdf
.txt

Usage

The scraper is searching the ./DATA directory by default. To change that you have to edit the variable directory.

Line 9: directory = "./DATA"

Note

It iterates through every file in the directory. To speed up the process, it is recommended to limit the amount of files.

Requirements

How to install the required libraries.

pip install pdfplumber

pip install docx

Improving

Suggestions for improvements are welcome.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Data-Scraper

File Content Read

Usage

Requirements

Improving

Files

README.md

Latest commit

History

README.md

File metadata and controls

Data-Scraper

File Content Read

Usage

Requirements

Improving