A 2-day workshop on how to scrape web forums with R
This intermediate course will teach you how to scrape user-generated content from the internet using R. On the first day, the course will start with a theoretical introduction to web scraping and specific approaches to scraping forums. We will then select two web forums and go through the process of scraping them. On the second day, we will discuss different kinds of websites that present problems for web scraping. One example of these is websites with dynamically generated content. We will select one such website and walk through the process of collecting data from it.
This is an intermediate-level course. Students must have a basic background in R.
- Go to https://noteable.edina.ac.uk/login
- Login with your EASE credentials
- Select RStudio as a personal notebook server and press start
- Go to File > New Project> Version Control > Git
- Copy and Paste this repository URL https://github.com/DCS-training/WebScrapingROctober2023/ as the Repository URL (The Project directory name will filled in automatically but you can change it if you want your folder in Notable to have a different name).
- Decide where to locate the folder. By default, it will locate it in your home directory
- Press Create Project Congratulations you have now pulled the content of the repository on your Notable server space.
- Go to (https://www.r-project.org/)[https://www.r-project.org/]
- Go to the download link
- Choose your CRAN mirror nearer to your location (either Bristol or Imperial College London)
- Download the correspondent version depending if you are using Windows Mac or Linux
- For Windows click on install R for the first time. Then download R for Windows and follow the installation widget. If you get stuck follow this (video tutorial)[https://www.youtube.com/watch?v=GAGUDL-4aVw]
- For Mac Download the most recent pkg file and follow the installation widget. If you get stuck follow this (video tutorial)[https://www.youtube.com/watch?v=EmZqlcKkJMM]
- Once R is installed you can install R studio (R interface)
- Go to (www.rstudio.com)[www.rstudio.com]
- Go in download
- Download the correspondent version depending on your Operating system and install it. If you get stuck check the videos linked above.
install.packages("tidyverse")
install.packages("lme4")
install.packages("effects")
install.packages("sjPlot")
install.packages("interactions")
library("tidyverse") #for cleaning and sorting out data
library("lme4") #for fitting linear mixed-effects models (LMMs)
library("effects") #for creating tables and graphics that illustrate effects in linear models
library("sjPlot") #for plotting models
library("interactions") #for plotting interaction effects
Once ready, you are going to find
- .ppt presentations used during the course
- example code
James Besse