Skip to content

DSSatPitt/web-scraping-workshop-2015

Repository files navigation

Transforming the Web into Data (with Python)

Workshop Materials

This repository contains the materials for the ULS/iSchool Digital Scholarship workshop held on Friday, April 17th 2015.

There are two main documents in this repository (and a couple supporting images), the slides (Web Scraping Tutorial.ipynb) and an example (Web Scraping Example.ipynb).

These documents are stored in this repository as IPython Notebooks, meaning they are JSON documents and not . The links below point to nbviewer so you can read them as a normal human being and not a machine.

If you are interested in building on top of these materials, feel free to fork this repository. You are fee to SHARE and ADAPT these mateirals as long as you ATTRIBUTE them as per the following creative commons license: CC-BY 2.0.

Slides

These slides contain a conceptual introduction to web scraping. They can be viewed as a document or as a set of slides.

Example

This notebook contains an example web scrape using Python with some in-line documentation about what is happening at each step.

Using tmpnb, the temporary notebook service

The materials in this repository can be served to participants using the jupyter/tmpnb service. I've included a Dockerfile in this repository that can be used to build an image that contains IPython, the necesary python libraries, and the notebooks in this repository. I built an image from this Dockerfile and called it jupyter/minimal so the tmpnb service would just automatically run it because that is the name of the default image tmpnb launches for temporary notebooks. This is pretty bad documentation, if you have questions just hit me up on twitter at @mcburton. I'll probably write up something more comprehsive about setting up temporary teaching enviroments with tmpnb.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published