Two scrapers that output Congressional Bill data and Congressional Member data (house, senate) to a Postgres database. Scrapers written using Scrapy following this tutorial.
Used by my site https://betterknowyourdistrict.com/
- Install the python dependencies:
pip install -r requirements.txt
- Create a
settings.py
file in the following directories -bill_scraper/scraper_app/
andmember_scraper/scraper_app/
.- bill_scraper/scraper_app/settings.py:
BOT_NAME = 'congressionalbills' SPIDER_MODULES = ['scraper_app.spiders'] DATABASE = { 'drivername': 'postgres', 'host': 'localhost', 'port': '5432', 'username': '[YOUR POASTGRES USER]', 'password': '[YOUR POSTGRES PASSWORD]', 'database': '[YOUR POASTGRES DB]' } ITEM_PIPELINES = { 'scraper_app.pipelines.BillPipeline': 200 } CONGRESS = '[YOUR CONGRESS NUMBER]'
- member_scraper/scraper_app/settings.py:
BOT_NAME = 'congressionalmembers' SPIDER_MODULES = ['scraper_app.spiders'] DATABASE = { 'drivername': 'postgres', 'host': 'localhost', 'port': '5432', 'username': '[YOUR POASTGRES USER]', 'password': '[YOUR POSTGRES PASSWORD]', 'database': '[YOUR POASTGRES DB]' } ITEM_PIPELINES = { 'scraper_app.pipelines.MemberPipeline': 200 }
- bill_scraper/scraper_app/settings.py:
- Run each pipeline from within its respective directory -
bill_scraper
ormember_scraper
. - Bill Scraper command:
scrapy crawl congressionalbills
- Member Scraper command:
scrapy crawl congressionalmembers
- Combine Propublica member information with with bulk data by running
get_member_details.py
after the XML bulk data scraper.