CAR18 Chicago

Introduction to Web Scraping

adapted from Alex Richards' (@alexrichards) excellent IRE17 class.

He'll also be teaching a repeat web scraping session Sunday!

This session will cover:

How web scraping will make your life easier
How to do so responsibly
Using third-party Python packages
Fetching web pages with Python
Navigating the HTML in those pages to get data
Structuring scraped data and writing it to a CSV
And a couple of tips on shortcuts with HTML tables!

Software requirements:

You should have Python on your machine. Type the following in Bash (on Mac OS, you can access it with an Application called Terminal) to check that you have the correct version for the class:

which python3

which should return something like

/Library/Frameworks/Python.framework/Versions/3.5/bin/python3

If not, and you're in the CAR18 class, you should flag down the instructor or a TA. If you're not in the class, download Python3.

If you already have Python 3, you should be able to run the command pip install -r requirements.txt after downloading this repository to get the packages listed below:

BeautifulSoup4
Jupyter
pandas
requests

Have questions?

You can always:

Send Alex a note (arichards@nerdwallet.com)
DM Alex on Twitter
Reach out to Melissa
open an issue here

Struggling with installation? Try this updated guide for Windows and OS X.

Resources:

Python

PyCAR for in-depth Python learning
CodeAcademy for Python syntax
Think Python, a popular introductory book whose digital edition is available free online

Scraping

The Coursera class Using Python to Access Web Data, for which you may want to take preceding classes in preparation

The Internet

How the Internet Works, a PyCon 2013 talk by Jessica McKellar
How Does The Internet, a zine as informative as it is cute, by Amy Wibowo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

CAR18 Chicago

Introduction to Web Scraping

This session will cover:

Software requirements:

Have questions?

Resources:

Python

Scraping

The Internet

Files

README.md

Latest commit

History

README.md

File metadata and controls

CAR18 Chicago

Introduction to Web Scraping

This session will cover:

Software requirements:

Have questions?

Resources:

Python

Scraping

The Internet