rankr is a platform for aggregating the results of different academic rankings.
rankr crawls university ranking tables and stores the results in .csv
files, after some pre-processing. At the moment, it supports QS, Shanghai (ARWU), and Times Higher Education (THE) for both their world university rankings and subject rankings.
To match academic institutions across ranking systems (to aggregate their results), it is necessary to have an identification system that assigns unique IDs to institutions. After some research, the GRID ID system was chosen, which provides a free database of about 100,000 research institutions; for example, the GRID ID for Sharif University of Technology is grid.412553.4
.
rankr first stores the entire GRID ID data (release 2020-06-29) in a database (using SQLAlchemy) and then iterates through each crawled ranking table and tries to match institutions with their respective GRID counterparts.
To achieve this, several methods are employed using:
- Institution profile URL in each ranking table
- Institution name & country
- Fuzzy matching of institutions
- Manual matching
It should be noted that the metadata from the GRID database is preferred in case of any discrepancy. For example, if a ranking website lists an institution under Country A
, but the GRID database records it under Country B
, the latter will be selected.
- Clone the repo:
git clone https://github.com/pmsoltani/rankr.git
- Switch to the repo directory:
cd rankr
- Make sure Docker is running.
- Create a
.env
file in the root directory (More info here). - Create a data directory:
mkdir backend/data
- Download the GRID database (from the link above) and extract it inside the new
data
directory. - Start the application:
docker-compose up -d
- Crawl the ranking tables:
docker-compose exec backend rankr crawl rankings
- Initialize the database for the first time:
docker-compose exec backend rankr db reset --confirm
- And you're done! Visit the following URL in your browser: http://0.0.0.0:8000
Please note that the project is still in a pre-alpha stage and it's not ready for production. As of now, the instructions above will fail because of a small bug in the GRID database (release 2020-06-29): The line 62833
of the file addresses.csv
inside the grid/full_tables
directory has an additional space character at the end of the country name: "Bonaire, Saint Eustatius and Saba ". Hopefully, this will be corrected in the database's next release. Until then the file must be manually corrected. (See the TODO section).
For obvious security reasons, the project's environment variables file (the .env
file) is not included in the repo, but here is a short, working version of it:
COMPOSE_PROJECT_NAME=rankr
INSTALL_PATH=/home/rankr
APP_ENV=development
POETRY_VERSION=1.0.10
APP_NAME=rankr
APP_HOST=0.0.0.0
APP_PORT=8000
USER_AGENT=Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/537.36
DIALECT=postgresql
POSTGRESQL_DRIVER=psycopg2
# database host is the docker compose service name
POSTGRESQL_HOST=postgres
POSTGRESQL_PORT=5432
POSTGRESQL_NAME=rankr
POSTGRESQL_USER=rankr
POSTGRESQL_PASS=postgres_super_secret_password
ADMINER_HOST=0.0.0.0
ADMINER_PORT=5050
ADMINER_DRIVER=pgsql
ADMINER_SERVER=postgres
ADMINER_DB=rankr
ADMINER_USERNAME=rankr
ADMINER_PASSWORD=postgres_super_secret_password
The stack currently has three containers:
postgres
, used to store the data in a persistant manner.adminer
(optional), serves as a GUI for managing the database. Can be removed by commenting out/deleteing theadminer
section of thedocker-compose.yml
file.backend
, hosts the API server.
- Create a web server and an API.
- Design and develop a dashboard.
- Finish the documentation.
- Make
countries.csv
a public file. - Update the
config.py
module to reflect the recent changes (i.e., the new rankr CLI). - Add more functionalities to the CLI (e.g., starting the webserver and running the tests).
- Dockerize the app.
- Add subject ranking tables
All contributions (suggestions, PRs, ...) are welcome.