This GitHub repository contains the code and tools used in the research paper titled Robustness of graph embedding methods for community detection
(
arXiv:2405.00636
). The project focuses on evaluating robustness of community detection methods based on graph embeddings.
If you use the script please cite this paper:
@article{
title={Robustness of graph embedding methods for community detection},
author={Wei, Zhi-Feng and Moriano, Pablo and Kannan, Ramakrishnan},
journal={arXiv preprint arXiv:2405.00636},
year={2024},
url = {https://arxiv.org/abs/2405.00636}
}
Below is a brief guide to the organization of the repository.
In the folder 0. Package WGE
, you will find some basic tools and functions we designed to carry out various tasks. They are enclosed as a package WGE
, which can be installed from the folder.
In the folder 1. Real World Graph Pre-Processing
, you will find two real-world networks stored in .gml
files. The notebook GML_2_NetworkX & Test.ipynb
is provided to extract the edge list and community membership of nodes in these real-world networks. Additionally, the notebook tests the distribution of community sizes and degree sequences for both networks.
Navigate to 2. Graph Generation and Pre-Processing
to find files (Gene_1k.py
, Gene_1w.py
, Gene_1k.ipynb
, Gene_1w.ipynb
) for generating LFR networks. The betweenness centrality of network nodes are also calculated and output. The notebook Gene_Btwn_Rank.ipynb
allows obtaining node ranks based on betweenness centrality.
In the folder 3. Edge Removal Process
, there are two programs: remove.py
for synthetic LFR graphs and remove_real.py
for real-world graphs. These programs generate sequences of edges and nodes to be removed. Files with the extension **.stoch_rmv
contain edge removal based on random node selection, while **.rank_rmv
files contain edge removal based on targeted node selection considering betweenness centrality rank.
Navigate to the 4. Main
folder to find programs Graph_Disturb.py
and Graph_Disturb_real.py
. These programs are used to calculate ECS similarity scores with more edges removed from graphs.
Detailed instructions and examples for running the code are provided in each folder's respective README files.
Our computations are performed on the Big Red 200 supercomputer, an HPE Cray EX system designed by Indiana University to support scientific and medical research, as well as advanced research in artificial intelligence, machine learning, and data analytics.
- Specifications:
- 640 compute nodes with 256 GB of memory each
- Two 64-core, 2.25 GHz, 225-watt AMD EPYC 7742 processors per node
- 64 GPU-accelerated nodes with 256 GB of memory
- Single 64-core, 2.0 GHz, 225-watt AMD EPYC 7713 processor per GPU node
- Four NVIDIA A100 GPUs per GPU node
- Theoretical peak performance (Rpeak) of nearly 7 petaFLOPS
- Managed with HPE's Performance Cluster Manager (HPCM)
- Operating System: SUSE Enterprise Linux Server (SLES) version 15 on compute, GPU, and login nodes
The following modules are loaded for the project on Big Red 200:
module load cudatoolkit
module load python/gpu/3.10.5
All the required Python packages for running our experiments are listed in the requirements.txt
file.
This project is licensed under the MIT License - see the LICENSE.md file for details.