The goal of this repo is to demonstrate the use of the stellargraph implementation of the graphsage algorithm for graph node inference to develop a graph embedding from a learned covid19 contact network to predict the risk classification of novel contacts based on prior knowledge of contacts and a vulnerability measure. The resulting low-dimensional embedding of the grapsage output layer stack can reveal useful contact proximity information not readily available based solely on contact lists or vulnerability data alone.
Deep-trace is a graphsage-based machine-learning pipeline for contact tracing. Conventional methods can only exploit knowledge of an individual person's contacts. Taken over the set of all individuals, this contact set is essentially a graph with nodes representing people and edges connecting contact between people. The proposed method allows us to utilize information stored in the graph contacts as well as node features to develop a method to classify individuals in the contact set as either susceptible, exposed or infected. In this particular case we use the covid vulnerability index to assign a feature vector to each node. We are then able to learn the contact network based not only on the graph node and edgelist specification, but also the vulnerability feature mapping. Thus we create a three-dimensional node embedding for new contacts that shows an assessment of their likelihood of being in one of three exposure categories – Infected, Exposed or Susceptible. This low dimensional embedding allows contact tracing personnel to prioritize which individuals they should contact and test in situations where a pandemic is evolving too quickly under limited personnel and test resources to correspond with everyone in the contact set. Thu one can quickly identify and prioritize which persons to contact and isolate.
Figure 1 below shows a TSNE projection of the data onto three dimensions for a simulated case study of 27 infected, 519 susceptible, and 419 exposed individuals:
This is a 2-D projection of the same TSNE embedding:
Accuracy and loss plots for the training dataset during the initial from scratch training:
The dataset consists of fictional contacts using the Cora dataset link data and the Covid19 vulnerability example feature data found here: https://github.com/closedloop-ai/cv19index.
The following ROC curve shows the performance on test data for the infected, exposed and susceptible test classes, respectively:
Confusion matrix for the susceptible class:
- Stellargraph
- NetworkX
- Sklearn
- Python3
- Tensorflow >= 2.0
- Keras > 2.3
- Pandas
Using Anaconda: conda env create -f deep-trace.yml
Note: the requirements.txt contains many extraneous packages used in other projects, so you won't need all of them.
stellargraph: https://pypi.org/project/stellargraph/
graphsage paper: https://arxiv.org/pdf/1706.02216.pdf
graph node embeddings: https://github.com/stellargraph/stellargraph/blob/develop/demos/node-classification/graphsage-node-classification.ipynb
compartmental modeling: https://en.wikipedia.org/wiki/Compartmental_models_in_epidemiology