Skip to content

Latest commit

 

History

History
33 lines (28 loc) · 1.93 KB

README.md

File metadata and controls

33 lines (28 loc) · 1.93 KB

Scraping-Resources

Python 3.9 License: GPL v3

What is it?

Scraping Resources is a one-stop-shop for data from individuals taps. Current taps support include: Reddit and Google RSS Feed. Refer to the previous links for more detailed utilization of each class.

Install the dependencies using the following commands for deployment.

virtualenv <venv name>
source <venv name>/Scripts/activate
pip install -r requirements.txt 

To leave the virtual environment use the following command

deactivate <venv name>

Quick Start

Clone the repository and use the following command to run the project,:

python3 test.py

Note: Add string elements to the query parameter to change searched results within the Google RSS Feed. Similary, changing the sub and queue (restricted to 'submissions' or 'comments') will effect the Reddit Scraping.

Data Output

Each tap will have a respective Logs and Data folder with JSON formatted files. An example of the data output from the application can be found here:

Tap Sample Output
Reddit output
Google RSS output

License

This project uses the GPL v3 License