OscarWorks Crawler

Welcome to the OscarWorks Crawler.

This is a web crawler designed to crawl through Waterloo Works and/or Oscar Plus.

This project has been made because it is difficult to quickly find the jobs that have the matching keywords you want.

~~The WaterlooWorks Crawler is still an in-progress project. It does not work, but I will have it working once I get back on my study term.~~

The WaterlooWorks Crawler is COMPLETE!!!

I still need to figure out a nice way for running both crawlers in parallel.

What does it do?

Run it with:

python crawl.py -k keywords.csv --login auth.txt -o output_folder/ --waterloo
or
python crawl.py -k keywords.csv --login auth.txt -o output_folder/ --mcmaster --waterloo

This will run through all the job postings on Waterloo Works or Oscar Plus with your authentication and download all the job postings and descriptions of the jobs that contain the keywords in the keywords.csv file

Running help with:

$ python crawl.py -h

Will get you help that looks like this:

OscarWorks Crawler (https://github.com/kunalchandan/OscarWorks_Crawler) is a
command line tool for extracting the job descriptions from Waterloo Works or
Oscar Plus (as of 2019 Feb) that contain the keywords specified in
keywords.csv the authorization to enter the sites is user provided in the
auth.txt file

optional arguments:
  -h, --help            show this help message and exit
  -k CSV_FILE, --keywords CSV_FILE
                        The keywords comma separated values file, default
                        value is "keywords.csv".
  -l LOGIN_FILE, --login LOGIN_FILE
                        The login file, default value is "auth.txt".
  -o OUTPUT_FOLDER, --output-folder OUTPUT_FOLDER
                        The output folder, default value is "query_find/".

Required Arguments:
  -m, --mcmaster        Flag specifies to log into McMaster's OscarPlus
  -w, --waterloo        Flag specifies to log into Waterloo's WaterlooWorks

Requirements

pip install selenium bs4 httplib2

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.gitignore		.gitignore
README.md		README.md
auth.txt		auth.txt
crawl.py		crawl.py
keywords.csv		keywords.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OscarWorks Crawler

What does it do?

Requirements

About

Releases

Packages

Languages

kunalchandan/OscarWorks_Crawler

Folders and files

Latest commit

History

Repository files navigation

OscarWorks Crawler

What does it do?

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages