This is a Pytorch implementation of the following paper:
Weihua Hu*, Bowen Liu*, Joseph Gomes, Marinka Zitnik, Percy Liang, Vijay Pande, Jure Leskovec. Strategies for Pre-training Graph Neural Networks. ICLR 2020. arXiv OpenReview
If you make use of the code/experiment in your work, please cite our paper (Bibtex below).
@inproceedings{
hu2020pretraining,
title={Strategies for Pre-training Graph Neural Networks},
author={Hu, Weihua and Liu, Bowen and Gomes, Joseph and Zitnik, Marinka and Liang, Percy and Pande, Vijay and Leskovec, Jure},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=HJlWWJSFDH},
}
We used the following Python packages for core development. We tested on Python 3.7
.
pytorch 1.0.1
torch-cluster 1.2.4
torch-geometric 1.0.3
torch-scatter 1.1.2
torch-sparse 0.2.4
torch-spline-conv 1.0.6
rdkit 2019.03.1.0
tqdm 4.31.1
tensorboardx 1.6
All the necessary data files can be downloaded from the following links.
For the chemistry dataset, download from chem data (2.5GB), unzip it, and put it under chem/
.
For the biology dataset, download from bio data (2GB), unzip it, and put it under bio/
.
In each directory, we have three kinds of files used to train GNNs.
python pretrain_contextpred.py --output_model_file OUTPUT_MODEL_PATH
python pretrain_masking.py --output_model_file OUTPUT_MODEL_PATH
python pretrain_edgepred.py --output_model_file OUTPUT_MODEL_PATH
python pretrain_deepgraphinfomax.py --output_model_file OUTPUT_MODEL_PATH
This will save the resulting pre-trained model to OUTPUT_MODEL_PATH
.
python pretrain_supervised.py --output_model_file OUTPUT_MODEL_PATH --input_model_file INPUT_MODEL_PATH
This will load the pre-trained model in INPUT_MODEL_PATH
, further pre-train it using supervised pre-training, and then save the resulting pre-trained model to OUTPUT_MODEL_PATH
.
python finetune.py --model_file INPUT_MODEL_PATH --dataset DOWNSTREAM_DATASET --filename OUTPUT_FILE_PATH
This will finetune pre-trained model specified in INPUT_MODEL_PATH
using dataset DOWNSTREAM_DATASET.
The result of fine-tuning will be saved to OUTPUT_FILE_PATH.
We release pre-trained models in model_gin/
and model_architecture/
for both biology (bio/
) and chemistry (chem/
) applications. Feel free to take the models and use them in your applications!
Our results in the paper can be reproduced by running sh finetune_tune.sh SEED DEVICE
, where SEED
is a random seed ranging from 0 to 9, and DEVICE
specifies the GPU ID to run the script. This script will finetune our saved pre-trained models on each downstream dataset.