Scraping-Resources

What is it?

Scraping Resources is a one-stop-shop for data from individuals taps. Current taps support include: Reddit and Google RSS Feed. Refer to the previous links for more detailed utilization of each class.

Install the dependencies using the following commands for deployment.

virtualenv <venv name>
source <venv name>/Scripts/activate
pip install -r requirements.txt

To leave the virtual environment use the following command

deactivate <venv name>

Quick Start

Clone the repository and use the following command to run the project,:

python3 test.py

Note: Add string elements to the query parameter to change searched results within the Google RSS Feed. Similary, changing the sub and queue (restricted to 'submissions' or 'comments') will effect the Reddit Scraping.

Data Output

Each tap will have a respective Logs and Data folder with JSON formatted files. An example of the data output from the application can be found here:

Tap	Sample Output
Reddit	output
Google RSS	output

License

This project uses the GPL v3 License

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Scraping-Resources

What is it?

Quick Start

Data Output

License

Files

README.md

Latest commit

History

README.md

File metadata and controls

Scraping-Resources

What is it?

Quick Start

Data Output

License