- Based on the DGL (Deep Graph Libary)
- Synchronous-training approach, ego-networks forming mini-batches
- Min-cut partitioning algorithm.
- Challenging to train GNNs because of the inherent dependencies between nodes (vertices) - next node can be on a completely different machine
- Each mini-batch must incorporate depending samples (1-hop, 2-hop, etc..). Sampling algo is important.
- As opposed to regular distributed NN training where the majority of network traffic comes from exchanging gradients, distributed GNN traffic comes from reading hundreds of neighbor vertices.
- Mini-batch training:
- Sample a set of N vertices, uniformly at random frmo the training set
- Pick at most K neighbor vertices (fan-out) for each target
- Compute the target vertex representations by gathering messages
- Samplers: sample the mini-batch graph structures from the input graph
- KVStore: distributed store for all vertex and edge data (so mini-batches can find each other when necessary)
- Special kv-store instead of Redis or similar because we need better co-location of node/edge fatures in KVStore and graph partitions
- Efficient updates for sparse embeddings
- Flexible partition politices to map data to different machines
- Supports sparse embedding for training transductive models with learnable vertex embeddings
- Trainers:
- 1st: fetch the mini-batch graphs from the samplers and corresponding vertex/edge features from KVStore
- 2nd: run forward/backward in parallel
- 3rd: dense model update component synchronized
- Graph partitioning: Uses METIS
- Densely connected vertices are assigned to the same partition to reduce the number of edge cuts
- Minimizing edge-cut improves GNN performance as we need to check the KVStore less and less.