Codebase for Evaluating Attribution for Graph Neural Networks.
Attribution is one tool in the interpretability toolkit that provides ranked importance values on an input (x) in relation to an output (y). You might care about using attribution techniques on models if you want to build credibility, if you want to debug a model, or want to create a hypothesis for scientific discovery. Not all attribution methods are created equal and practitioners should understand the strengths and weakness of these techniques. We can evaluate these techniques because graphs are a natural testbed: we can create synthetic graph tasks where we can generate labels and ground truth attributions.
A code snippet that demonstrastes how to create an attribution on a graph:
import graph_attribution as gatt
task_type = 'benzene'
block_type = 'gcn'
exp, task, methods = gatt.experiments.get_experiment_setup(task_type, block_type)
hp = gatt.hparams.get_hparams({'block_type':block_type, 'task_type':task_type})
gnn = experiments.GNN.from_hparams(hp, task)
gnn(exp.x_test)
# Train model here!
pred_att = methods['CAM'].attribute(exp.x_test, gnn)
result = task.evaluate_attributions(exp.att_test, pred_att)
print(result) # A dict of attribution statistics.
If you want to get up and running with building graph attributions from scratch, we recommend you run notebooks/train_and_evaluate.ipynb, which sets up an attribution task, trains a GNN on a predictive task, and calculates attributions with several techniques, and finally evaluates the attributions. At the end of the notebook, you can visually compare graph attributions.
You can run code to replicate all results in the paper using notebooks/plot_evaluation_results.ipynb, which you can also run live in Colab (no downloads required).
If you'd like to run the code locally, or extend it, read on.
Attribution techniques:
- Grad * Input
- CAM (Class activation maps)
- GradCAM (Gradient CAM)
- SmoothGrad
- Integrated Gradients
- Attention weights
We test attribution quality on several GNN architectures:
- GCN (Graph Convolution Network), where our learnt representations depend on learnt nodes.
- GAT (Graph Attention Network), where message passing happens via an attention mechanism.
- MPNN (Message Passing Neural Network), where our learnt representations depend on learnt nodes and edges.
- GraphNets, learning node, edge and global embeddings and conditioning each based on these learnt attributes.
To test out new ideas check out graph_attribution/templates.py, which has all main abstract classes in the codebase. In particular AttributionTask is useful for tasks, TransparentModel for GNN models, AttributionTechnique for new attribution techniques.
The rest of the files are organized as:
- data/ holds all datasets, one folder per task.
- data/dataset_bias holds a folder for each spurious correlation task.
- data/results holds CSV files with results from the main publication.
- data/NOTICE details properties of this data redistribution.
- notebooks/ holds Jupyter notebooks.
- scripts/ python scripts for generating datasets.
- graph_attribution/ holds the code for creating models, generating and evaluating attributions.
The codebase is primarily a Tensorflow 2.0 based framework that uses Sonnet and Graph Nets for building GNN models. If you are using pre-generated datsets, you can git clone the repo and pip install it:
pip install git+https://github.com/google-research/graph-attribution
If you plan on generating datasets, we recommend using Anaconda for installing all dependencies. Requirements can be installed into a fresh conda environment as follows:
$ conda env create -f environment.yml -n graph_attribution
Once installed you can run a notebook but running:
$ conda activate graph_attribution
$ jupyter notebook *.ipynb
If you use this code in your work, we ask that you cite our work. Here is an example BibTex entry:
@article{NEURIPS2020_6054,
title = {Evaluating Attribution for Graph Neural Networks},
author = {Benjamin Sanchez-Lengeling and Jennifer Wei and Brian Lee and Emily Reif and Wesley Qian and Yiliu Wang and Kevin James McCloskey and Lucy Colwell and Alexander B Wiltschko},
booktitle = {Advances in Neural Information Processing Systems 33},
year = {2020},
url = {https://papers.nips.cc/paper/2020/hash/417fbbf2e9d5a28a855a11894b2e795a-Abstract.html},
}
If you cite this work, you may also want to cite:
- McCloskey, K., Taly, A., Monti, F., Brenner, M. P. & Colwell, L. J. Using attribution to decode binding mechanism in neural network models for chemistry. Proc. Natl. Acad. Sci. U. S. A. 116, 11624–11629 (2019)
- Ying, Z., Bourgeois, D., You, J., Zitnik, M. & Leskovec, J. GNNExplainer: Generating Explanations for Graph Neural Networks. in Advances in Neural Information Processing Systems (eds. Wallach, H. et al.) vol. 32 9244–9255 (Curran Associates, Inc., 2019).
This is not an official Google product.