GitHub - shayantabrizi/PPC: Personalized PageRank Clustering: A Graph Clustering Algorithm Based on Random Walks

shayantabrizi / PPC Public

Notifications You must be signed in to change notification settings
Fork 0
Star 1

Personalized PageRank Clustering: A Graph Clustering Algorithm Based on Random Walks

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Datasets/test		Datasets/test
fib		fib
Makefile		Makefile
clusterer.cpp		clusterer.cpp
clusterer.h		clusterer.h
graph.cpp		graph.cpp
graph.h		graph.h
main.cpp		main.cpp
network.cpp		network.cpp
network.h		network.h
node.cpp		node.cpp
node.h		node.h
nodeStarLinkList.cpp		nodeStarLinkList.cpp
nodeStarLinkList.h		nodeStarLinkList.h
randomNumberGenerator.cpp		randomNumberGenerator.cpp
randomNumberGenerator.h		randomNumberGenerator.h
readme.txt		readme.txt

Repository files navigation

-----------------------------------------------------------------------------

PPC: A scalable graph clustering algorithm

Based on the article "Personalized PageRank Clustering: A Graph Clustering Algorithm Based on Random Walks"
Authors: S.A. Tabrizi, A. Shakery, M. Asadpour, M. Abbasi, M.A. Tavallaie
Version: 1.3

-----------------------------------------------------------------------------

Code Author : M. Abbasi
Email : ma.abbasi@ut.ac.ir
University : University of Tehran, Iran

-----------------------------------------------------------------------------

Please cite:
Shayan A. Tabrizi, Azadeh Shakery, Masoud Asadpour, Maziar Abbasi, Mohammad Ali Tavallaie, Personalized PageRank Clustering: A graph clustering algorithm based on random walks, Physica A: Statistical Mechanics and its Applications, vol. 392, no. 22, pp. 5772-5785, http://dx.doi.org/10.1016/j.physa.2013.07.021

Disclaimer:
We really appreciate bug reports and any suggestions for improving the algorithm. Please contact us via email at s.tabrizi@ut.ac.ir.

Usage:
The second argument is optional and changes the random number generator seed.

1) For simple undirected graphs:
./clusterer test 0

clusterer reads Datasets/test/test.txt
test.txt consists of the network edges in the format of "src dst"

2) For weighted undirected graphs:
./weclusterer test 0
./weintclusterer test 0

weintclusterer is optimized for integer edge weights.
weclusterer (weintclusterer) reads Datasets/test/test.txt
test.txt consists of network edges in the format of "src dst weight"
In case of conflicts, e.g. "2 5 2.5", "2 5 3.5" and "5 2 3", they consider "2 5 3.5", in which the weight is the maximum of the input weights.

Program automatically generates Datasets/test/test.smg for future use and it might take more time in the first run. In the future releases, this part should be more optimized.
if you have a .smg file, you don't need the .txt file anymore.

if you like to compile the program manually, here is what you need:
-DcompLimit=40 you can set a limit for the number of clusters
-DexactClustersCount using it in conjunction with -DcompLimit produces exactly compLimit clusters (Useful for when the number of clusters is known) (compLimit must be not higher than the number of vertices)
-DpageRankAlpha=0.7 to change the PageRank alpha value
-DTRIAL_DEFAULT=50 minimum number of random jumps starting from a vertex
-DTRIAL_DEFUALT_MULT=10 number of random jumps starting from a vertex = max{TRIAL_DEFUALT_MULT * node's degree, DTRIAL_DEFAULT}
-DHA creates .tree output file
-Dhier creates .tree output file: with hierarchy -> Datasets/football/hier_football.tree
-Dcreategraphml creates .graphml output file
-DWE for weighted graphs with float weights
-DWE -DintWeight for weighted graphs with integer weights

-----------------------------------------------------------------------------

Change Log:
Release 1.3:
Bug Fixes:
Fixed yet another bug in reading big graph files.
Improvements:
Moving a node from non-seeds to seeds only if it is connected to at least one seed node.
Release 1.2:
Bug Fixes:
Fixed a bug in reading big graph files (more than about 60000 vertices).
Release 1.1:
New Features:
Added support for determining the exact number of clusters.
Bug Fixes:
Fixed some issues.
Release 1.0