Graph ML Course Project : Group 06

This is our project for the course GMLFA: Graph Machine Learning Foundations and Applications.

Problem Statement

Predicting the category of a product in a multi-class classification setup among 47 top-level categories by using an undirected and unweighted graph which represents an Amazon product co-purchasing network.

Dataset

ogbn-products [1]: The ogbn-products dataset is an undirected and unweighted graph, representing an Amazon product co-purchasing network. Nodes represent products sold on Amazon, and edges between two products indicate that the products are purchased together.

Why is this project interesting

The ogbn-products dataset is an ideal benchmark dataset for the field to move beyond the extremely small graph datasets and to catalyze the development of scalable mini-batch-based graph models. It also uses a realistic split based on the sales ranking of the product rather than a random split which offers an opportunity to improve out-of-distribution generalization. The project is specifically interesting because a distinct correlation can be established between which category of products is usually bought together by customers. This information can help increase sales of companies and also increase the ease of customer access to correlated products.

Core Architecture

SAGN [2] (Scalable and Adaptive Graph Neural Networks) is our base Graph Neural Network architecture. This is because it works well for large datasets like the one we have used. It is a very expressive classifier as it uses an inception-like module and uses learnable attention weights. On top of this architecture, we plan to implement the Reliable Label Utilization (RLU) module used with the GAMLP [3] ( Graph Attention Multi-Layer Perceptron) architecture to better utilize the predicted soft labels from the classifier.

File Structure

dataset.py This includes the code for loading the dataset and the evaluator using the OGBN framework
layers.py This file has the implementation of the layers in the model.
models.py Here is where we have defined the SAGN architecture
pre_process.py This file contains the pre-processing functions that include the neighbourhood aggregation function for the initial set of features for the nodes
train_utils.py Here we have defined the main train and evalutaion loops for one epoch. The loss function has been calculated here
train.py This file has the main train loop of the architecture. The complete pipeline for training is defined here including the multiple stages
utils.py This file contains some utility functions used for the whole project.

Note

This code is loosely based on two github repos SAGN and GAMLP. Our work has been coming up a design to incorporate bothe the designs and both extensive code commenting and cleaning of the code.

References

Hu, W., Fey, M., Zitnik, M., Dong, Y., Ren, H., Liu, B., Catasta, M., & Leskovec, J. (2020). Open Graph Benchmark: Datasets for Machine Learning on Graphs. arXiv. https://doi.org/10.48550/arXiv.2005.00687
Chuxiong Sun, & Guoshi Wu (2021). Scalable and Adaptive Graph Neural Networks with Self-Label-Enhanced training. CoRR, abs/2104.09376.
Wentao Zhang, Ziqi Yin, Zeang Sheng, Wen Ouyang, Xiaosen Li, Yangyu Tao, Zhi Yang, & Bin Cui (2021). Graph Attention Multi-Layer Perceptron. CoRR, abs/2108.10097. (https://arxiv.org/abs/2206.04355)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph ML Course Project : Group 06

Problem Statement

Dataset

Why is this project interesting

Core Architecture

File Structure

Note

References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
README.md		README.md
dataset.py		dataset.py
layers.py		layers.py
models.py		models.py
pre_process.py		pre_process.py
train.py		train.py
train_utils.py		train_utils.py
utils.py		utils.py

ShrinivasSK/GMLFA-Course-Project

Folders and files

Latest commit

History

Repository files navigation

Graph ML Course Project : Group 06

Problem Statement

Dataset

Why is this project interesting

Core Architecture

File Structure

Note

References

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages