Scraping-Resources

What is it?

Scraping Resources is a one-stop-shop for data from individuals taps. Current taps support include: Reddit and Google RSS Feed. Refer to the previous links for more detailed utilization of each class.

Install the dependencies using the following commands for deployment.

virtualenv <venv name>
source <venv name>/Scripts/activate
pip install -r requirements.txt

To leave the virtual environment use the following command

deactivate <venv name>

Quick Start

Clone the repository and use the following command to run the project,:

python3 test.py

Note: Add string elements to the query parameter to change searched results within the Google RSS Feed. Similary, changing the sub and queue (restricted to 'submissions' or 'comments') will effect the Reddit Scraping.

Data Output

Each tap will have a respective Logs and Data folder with JSON formatted files. An example of the data output from the application can be found here:

Tap	Sample Output
Reddit	output
Google RSS	output

License

This project uses the GPL v3 License

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
Resources		Resources
Taps		Taps
output		output
.gitattributes		.gitattributes
.gitignore		.gitignore
README.md		README.md
batchTest.py		batchTest.py
requirements.txt		requirements.txt
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scraping-Resources

What is it?

Quick Start

Data Output

License

About

Releases

Packages

Languages

lovendatj/Scraping-Resources

Folders and files

Latest commit

History

Repository files navigation

Scraping-Resources

What is it?

Quick Start

Data Output

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages