A Python Selenium program that converts York University course data to JSON
- What is this?
- Requirements
- Installation
- Running the Program
- Arguments
- Files in this Repository
- Donate
This program outputs a JSON file that is generated by traversing York University's course website.
Copy and paste this into your Terminal to install the required libraries:
pip install bs4 ; pip install configparser
You can install this program by typing
git clone https://github.com/hussein-esmail7/yorku-class-scraper
cd yorku-class-scraper/
To use this program, you have to make sure you are in the correct directory
(using the cd
command).
python3 yorku-scrape.py
There are two options to input your arguments into this program:
- When the program asks for it if it is not given in the Terminal, or
- In the Terminal. See Command Arguments for the ways to put it in.
HTML File(s) Location
: Where the local HTML files are stored. With the way this program is structured, these files have to be offline. To do this, pressCtrl+S
orCmd+S
while on each page in the browser, and save them to the folder you're going to give the program. If your outputted JSON file looks like it is missing courses, try redownloading your HTML files as they might not have fully loaded. Tip: Scroll all the way to the bottom of each page before you download.JSON File Name
: This is pretty self-explanatory, but what you want to name your output JSON file. This asks before it starts iterating through all the courses in case it takes a while so you don't have to sit at your computer waiting for it to ask you. At the moment, it only makes a JSON file in the same directory as the file.
-h
,--help
: Help message and exit program.-o
,--output
: Input the output JSON file name as a string.-c
,--code
: Input the course code(s) you want to get. Examples: EECS, ADMS, EN, ENG, etc. if you are inputting multiple, make sure to have it in quotations. Examples: "EECS ADMS", "EECS, WRIT"-q
,--quiet
: Quiet mode. Only display text when required (progress bars not included, use--no-progress
for this).
dict-format.pdf
: PDF of how the JSON is formatted in case you want to make your own program that uses this data (and I recommend you do! I want this repository to help others).dict-format.tex
: LaTeX source file fordict-format.pdf
html/
: HTML files that were used to make the JSON files injson/
json/
: This folder contains JSON files I've already created by using this program to potentially save the next person some time.yorku_scrape.py
: Main Python program