This program will take a list of subreddits and a list of words and search each word in each subreddit. I will then take the top n results, sort method and n set by user, and scrape the post and comment text. User can set a maximum number of comments to scrape. Last, word frequency lists and wordcloud will be output.
- requests
- selenium
- webdriver manager
- numpy
- BeautifulSoup
- pandas
- matplotlib
- pillow
- wordcloud
copy these to install each package
pip install requests
pip install selenium
pip install selenium webdriver-manager
pip install numpy
pip install beautifulsoup4
pip install pillow
pip install matplotlib
pip install wordcloud
only files you need are redditScraper.py, request.csv, and noiseWords.csv. The others will be written by the program
- Edit request.csv to gather desired results. column 1 is the list of subreddits to crawl, column 2 is the words that will be searched in each subreddit
- Number of search results can be set in main()
- Sort method can be set in generateURL() in variable named sort
- noiseWords.csv can be edited to remove unwanted words
- wordCloud function has wordcloud size and color settings
- run redditScraper.py
- This is currently not very efficient so please be patient