Skip to content

Output Congressional Bill and Member data to Postgres using Scrapy.

Notifications You must be signed in to change notification settings

tkah/congress-scrapy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

US Congressional Bill and Member Scrapers

Two scrapers that output Congressional Bill data and Congressional Member data (house, senate) to a Postgres database. Scrapers written using Scrapy following this tutorial.

Used by my site https://betterknowyourdistrict.com/

Running

  • Install the python dependencies: pip install -r requirements.txt
  • Create a settings.py file in the following directories - bill_scraper/scraper_app/ and member_scraper/scraper_app/.
    • bill_scraper/scraper_app/settings.py:
         BOT_NAME = 'congressionalbills'
         SPIDER_MODULES = ['scraper_app.spiders']
         DATABASE = {
             'drivername': 'postgres',
             'host': 'localhost',
             'port': '5432',
             'username': '[YOUR POASTGRES USER]',
             'password': '[YOUR POSTGRES PASSWORD]',
             'database': '[YOUR POASTGRES DB]'
         }
         ITEM_PIPELINES = { 'scraper_app.pipelines.BillPipeline': 200 }
         CONGRESS = '[YOUR CONGRESS NUMBER]'
    • member_scraper/scraper_app/settings.py:
         BOT_NAME = 'congressionalmembers'
         SPIDER_MODULES = ['scraper_app.spiders']
         DATABASE = {
             'drivername': 'postgres',
             'host': 'localhost',
             'port': '5432',
             'username': '[YOUR POASTGRES USER]',
             'password': '[YOUR POSTGRES PASSWORD]',
             'database': '[YOUR POASTGRES DB]'
         }
         ITEM_PIPELINES = { 'scraper_app.pipelines.MemberPipeline': 200 }
  • Run each pipeline from within its respective directory - bill_scraper or member_scraper.
  • Bill Scraper command: scrapy crawl congressionalbills
  • Member Scraper command: scrapy crawl congressionalmembers
  • Combine Propublica member information with with bulk data by running get_member_details.py after the XML bulk data scraper.

About

Output Congressional Bill and Member data to Postgres using Scrapy.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages