GitHub

What it does

This program will take a list of subreddits and a list of words and search each word in each subreddit. I will then take the top n results, sort method and n set by user, and scrape the post and comment text. User can set a maximum number of comments to scrape. Last, word frequency lists and wordcloud will be output.

Requirements:

requests
selenium
webdriver manager
numpy
BeautifulSoup
pandas
matplotlib
pillow
wordcloud

copy these to install each package
pip install requests
pip install selenium
pip install selenium webdriver-manager
pip install numpy
pip install beautifulsoup4
pip install pillow
pip install matplotlib
pip install wordcloud

How to use:

only files you need are redditScraper.py, request.csv, and noiseWords.csv. The others will be written by the program

Edit request.csv to gather desired results. column 1 is the list of subreddits to crawl, column 2 is the words that will be searched in each subreddit
Number of search results can be set in main()
Sort method can be set in generateURL() in variable named sort
noiseWords.csv can be edited to remove unwanted words
wordCloud function has wordcloud size and color settings
run redditScraper.py
This is currently not very efficient so please be patient

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
.venv		.venv
__pycache__		__pycache__
comm.html		comm.html
comm2.html		comm2.html
commentScraper.py		commentScraper.py
contractions.csv		contractions.csv
noiseWords.csv		noiseWords.csv
oldreddittest.py		oldreddittest.py
readme.md		readme.md
readmeoldR.txt		readmeoldR.txt
redditResults.csv		redditResults.csv
redditResultsGenerator.py		redditResultsGenerator.py
redditScraper.py		redditScraper.py
redditScraperDumby.py		redditScraperDumby.py
request.csv		request.csv
requirements.txt		requirements.txt
searchGenerator.py		searchGenerator.py
searches.csv		searches.csv
wordFrequency.csv		wordFrequency.csv
wordFrequencyWordCloud.csv		wordFrequencyWordCloud.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

What it does

Requirements:

How to use:

About

Releases

Packages

Languages

justinchiao/redditScraper

Folders and files

Latest commit

History

Repository files navigation

What it does

Requirements:

How to use:

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages