Graph-MLP Sampling Extension

IMPORTANT: This is a fork of Graph-MLP in which we add new sampling strategies and other datasets.

PyTorch official implementation of Graph-MLP: Node Classification without Message Passing in Graph. For details on the original Graph-MLP, please refer to the paper: https://arxiv.org/abs/2106.04051

Structure

data
- Contains the computed adjacency matrices and node degree values.
dataset
- Contains the raw and processed data of the used datasets.
run-scripts
- All run scripts (calc-adj-*.sh are used to calculate the adjacency matrix with out further running the Graph-MLP pipeline)
train.py
- Main Python file that manages Graph-MLP
sample.py
- Sampling methods implemented by us

Requirements

PyTorch 1.7
Python 3.7

Installation

First, create a new virtual environment or use your global one and install PyTorch and PyTorch-Geometric (PyG).

Install the correct PyTorch version for your system from the official website. Then install the corresponding PyG version (correct PyTorch version and same CUDA version) from here.

After this is done install the remaining requirements from the requirements.txt by running:

pip install -r requirements.txt

WandB

Our implementation uses Weights & Biases to easily track your runs. To change the entity (personal or team account) and the project, change the variables at the top of train.py. To disable logging in a run, add the --no-wandb flag.

To track runs you have to log in to WandB on your device. To do this activate the virtual environment and run

wandb login

Datasets

This is a list of datasets implemented in this extension. For the --data argument use the name in parentheses.

Cora (cora), CiteSeer (citeseer), Pubmed (pubmed): Citation networks, commonly used for a node classification baseline.
OGBN-arxiv (ogbn-arxiv): A large citation network based on papers from arxiv from 2017 to 2019.
Reddit2 (reddit2): A condensed network of Reddit-posts connected by an overlapping commenter.
FacebookPagePage (facebook): A network of official Facebook pages connected by mutual likes.

Note: OGBN-arxiv and Reddit2 have many nodes. Make sure you have enough resources to run Graph-MLP on these datasets. See the table below for more information.

	Cora	CiteSeer	Pubmed	ogbn-arxiv	Reddit2	Facebook
Nodes	2,708	3,327	19,717	169,343	232,965	22,470
Edges	5,429	4,732	44,338	1,166,243	23,213,838	171,002
Classes	7	6	3	40	41	4
Features	1,433	3,703	500	128	602	14,000

Samplers

See supplementary-information/Sampling.md.

Usage

## cora
python3 train.py --data=cora --epochs=400 --hidden=256 --dropout=0.6 --lr=0.001 --weight_decay=5e-3 --alpha=100.0 --batch_size=2000 --order=3 --tau=2 --sampler=random_batch

## citeseer
python3 train.py --data=citeseer --epochs=400 --hidden=256 --dropout=0.6 --lr=0.01 --weight_decay=5e-3 --alpha=1.0 --batch_size=2000 --order=2 --tau=1 --sampler=random_batch

## pubmed
python3 train.py --data=pubmed --epochs=400 --hidden=256 --dropout=0.6 --lr=0.001 --weight_decay=5e-3 --alpha=1.0 --batch_size=3000 --order=3 --tau=2 --sampler=random_batch

## FacebookPagePage
python3 train.py --data=facebook --epochs=400 --hidden=256 --dropout=0.6 --lr=0.001 --weight_decay=5e-4 --alpha=1.0 --batch_size=2000 --order=4 --tau=0.5 --sampler=random_batch

## ogbn-arxiv
python3 train.py --data=ogbn-arxiv --epochs=400 --hidden=2048 --dropout=0.15 --lr=0.001 --weight_decay=0 --alpha=30.0 --batch_size=7000 --order=3 --tau=15 --sampler=random_batch

## Reddit2
python3 train.py --data=reddit2 --epochs=400 --hidden=2048 --dropout=0.15 --lr=0.001 --weight_decay=0 --alpha=30.0 --batch_size=7000 --order=2 --tau=15 --sampler=random_batch

or

## Run all samplers on <dataset> ('cora', 'citeseer', 'pubmed', 'facebook')
bash run.sh <dataset>

## Run <sampler> on ogbn-arxiv, see supplementary-information/Sampling.md
bash run-arxiv.sh <sampler>

## Run <sampler> on Reddit2, see supplementary-information/Sampling.md
bash run-reddit.sh <sampler>

When new experiment is finished, the new result will also be appended to log.txt.

We also provide a tutorial to run our experiment on the BwUniCluster2.0 (see Unicluster.md). The run scripts mentioned above also contain the necessary slurm definitions.

Cite

As this repository only extends Graph-MLP, please still cite the original paper if you use this code in your own work:

@misc{hu2021graphmlp,
      title={Graph-MLP: Node Classification without Message Passing in Graph}, 
      author={Yang Hu and Haoxuan You and Zhecan Wang and Zhicheng Wang and Erjin Zhou and Yue Gao},
      year={2021},
      eprint={2106.04051},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Name		Name	Last commit message	Last commit date
Latest commit History 142 Commits
data		data
dataset		dataset
images		images
run-scripts		run-scripts
supplementary-information		supplementary-information
.gitignore		.gitignore
DatasetCalculations.ipynb		DatasetCalculations.ipynb
Graph_MLP_Sampling.pdf		Graph_MLP_Sampling.pdf
README.md		README.md
log.txt		log.txt
models.py		models.py
normalization.py		normalization.py
requirements.txt		requirements.txt
sample.py		sample.py
train.py		train.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph-MLP Sampling Extension

Structure

Requirements

Installation

WandB

Datasets

Samplers

Usage

Cite

About

Languages

Tobi2K/Graph-MLP-Sampling

Folders and files

Latest commit

History

Repository files navigation

Graph-MLP Sampling Extension

Structure

Requirements

Installation

WandB

Datasets

Samplers

Usage

Cite

About

Topics

Resources

Stars

Watchers

Forks

Languages