GitHub - rjshanahan/facebook_m_scraper: webscraper for facebook group pages

fb webscraper for groups and pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various fb groups and pages.

The program uses Selenium (and ChromeDriver) to automate user behaviour within a browser session to login to the facebook mobile site, expand collapsible sections for 2015 or load data from dynamic scrolling. Once the pages are rendered the HTML is extracted and sieved through BeautifulSoup. Note: fb are smart so this may be a little flaky, but seems to work ok for now.

This program will extract the following and output to a CSV file with punctuation and other non-text characters removed:

full post text from each page of facebook entries
date
header
url
user name
popularity metrics (a string containing likes/comments/shares)
like_fave: integer value for number of likes
share_rtwt: integer value for number of shares

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
README.md		README.md
facebook_m_selenium_scraper.py		facebook_m_selenium_scraper.py
notebook_facebook_m_scraper.ipynb		notebook_facebook_m_scraper.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fb webscraper for groups and pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various fb groups and pages.

About

Releases

Packages

Languages

rjshanahan/facebook_m_scraper

Folders and files

Latest commit

History

Repository files navigation

fb webscraper for groups and pages

Python web scraper using Selenium and BeautifulSoup modules to extract text from various fb groups and pages.

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages