Skip to content

lovendatj/Scraping-Resources

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

20 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Scraping-Resources

Python 3.9 License: GPL v3

What is it?

Scraping Resources is a one-stop-shop for data from individuals taps. Current taps support include: Reddit and Google RSS Feed. Refer to the previous links for more detailed utilization of each class.

Install the dependencies using the following commands for deployment.

virtualenv <venv name>
source <venv name>/Scripts/activate
pip install -r requirements.txt 

To leave the virtual environment use the following command

deactivate <venv name>

Quick Start

Clone the repository and use the following command to run the project,:

python3 test.py

Note: Add string elements to the query parameter to change searched results within the Google RSS Feed. Similary, changing the sub and queue (restricted to 'submissions' or 'comments') will effect the Reddit Scraping.

Data Output

Each tap will have a respective Logs and Data folder with JSON formatted files. An example of the data output from the application can be found here:

Tap Sample Output
Reddit output
Google RSS output

License

This project uses the GPL v3 License

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages