Scraping Resources is a one-stop-shop for data from individuals taps. Current taps support include: Reddit and Google RSS Feed. Refer to the previous links for more detailed utilization of each class.
Install the dependencies using the following commands for deployment.
virtualenv <venv name>
source <venv name>/Scripts/activate
pip install -r requirements.txt
To leave the virtual environment use the following command
deactivate <venv name>
Clone the repository and use the following command to run the project,:
python3 test.py
Note: Add string elements to the
query
parameter to change searched results within the Google RSS Feed. Similary, changing thesub
andqueue
(restricted to 'submissions' or 'comments') will effect the Reddit Scraping.
Each tap will have a respective Logs and Data folder with JSON formatted files. An example of the data output from the application can be found here:
Tap | Sample Output |
---|---|
output | |
Google RSS | output |
This project uses the GPL v3 License