This repository consists of a solution that includes the analysis of financial data and detection of Fraud detection using Graph Machine Learning. The solution uses a Relational Graph Convolutional Network that generates unique graph features from neighbourhood information to aid in better and effective detection of fraud.
The setps are as follows:
- Pre-processing
- Making edgelists using the user identity columns
- Generating a multi dimensional heterogenous graph using the data along with these edgelists
- Using the Deep Learning models to generate predictions
The dataset used was the IEEE-CIS Fraud Detection Dataset provided by Vesta on Kaggle
Two RGCN (Relational Graph Convolutional Networks) were developed and tested
- Shallow RGCN
- Deep RGCN
Despite the heavy class imbalance, the Deep RGCN produced great results and outperformed the shallow RGCN. The evaluation metric scores were as follows:
- F1: 0.6228 (Shallow network 0.48)
- Accuracy: 98% (Shallow network 97.48%)
- Precision: 0.8872 (Shallow network 0.8240)
- Recall: 0.4798 (Shallow network 0.3410)
- Confusion matrix:
Predicte Positive | Predicted Negative | |
---|---|---|
Positive | 1950 | 248 |
Negative | 2114 | 113796 |
The architecture of the solution is as follows:
To run the code, simply run the Jupyter notebooks in this order:
- DataPrep
- Modelling
- Visualization
Detecting fraud in heterogeneous networks using Amazon SageMaker and Deep Graph Library