-
Notifications
You must be signed in to change notification settings - Fork 0
/
readme.txt
75 lines (56 loc) · 3.26 KB
/
readme.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
-----------------------------------------------------------------------------
PPC: A scalable graph clustering algorithm
Based on the article "Personalized PageRank Clustering: A Graph Clustering Algorithm Based on Random Walks"
Authors: S.A. Tabrizi, A. Shakery, M. Asadpour, M. Abbasi, M.A. Tavallaie
Version: 1.3
-----------------------------------------------------------------------------
Code Author : M. Abbasi
Email : ma.abbasi@ut.ac.ir
University : University of Tehran, Iran
-----------------------------------------------------------------------------
Please cite:
Shayan A. Tabrizi, Azadeh Shakery, Masoud Asadpour, Maziar Abbasi, Mohammad Ali Tavallaie, Personalized PageRank Clustering: A graph clustering algorithm based on random walks, Physica A: Statistical Mechanics and its Applications, vol. 392, no. 22, pp. 5772-5785, http://dx.doi.org/10.1016/j.physa.2013.07.021
Disclaimer:
We really appreciate bug reports and any suggestions for improving the algorithm. Please contact us via email at s.tabrizi@ut.ac.ir.
Usage:
The second argument is optional and changes the random number generator seed.
1) For simple undirected graphs:
./clusterer test 0
clusterer reads Datasets/test/test.txt
test.txt consists of the network edges in the format of "src dst"
2) For weighted undirected graphs:
./weclusterer test 0
./weintclusterer test 0
weintclusterer is optimized for integer edge weights.
weclusterer (weintclusterer) reads Datasets/test/test.txt
test.txt consists of network edges in the format of "src dst weight"
In case of conflicts, e.g. "2 5 2.5", "2 5 3.5" and "5 2 3", they consider "2 5 3.5", in which the weight is the maximum of the input weights.
Program automatically generates Datasets/test/test.smg for future use and it might take more time in the first run. In the future releases, this part should be more optimized.
if you have a .smg file, you don't need the .txt file anymore.
if you like to compile the program manually, here is what you need:
-DcompLimit=40 you can set a limit for the number of clusters
-DexactClustersCount using it in conjunction with -DcompLimit produces exactly compLimit clusters (Useful for when the number of clusters is known) (compLimit must be not higher than the number of vertices)
-DpageRankAlpha=0.7 to change the PageRank alpha value
-DTRIAL_DEFAULT=50 minimum number of random jumps starting from a vertex
-DTRIAL_DEFUALT_MULT=10 number of random jumps starting from a vertex = max{TRIAL_DEFUALT_MULT * node's degree, DTRIAL_DEFAULT}
-DHA creates .tree output file
-Dhier creates .tree output file: with hierarchy -> Datasets/football/hier_football.tree
-Dcreategraphml creates .graphml output file
-DWE for weighted graphs with float weights
-DWE -DintWeight for weighted graphs with integer weights
-----------------------------------------------------------------------------
Change Log:
Release 1.3:
Bug Fixes:
Fixed yet another bug in reading big graph files.
Improvements:
Moving a node from non-seeds to seeds only if it is connected to at least one seed node.
Release 1.2:
Bug Fixes:
Fixed a bug in reading big graph files (more than about 60000 vertices).
Release 1.1:
New Features:
Added support for determining the exact number of clusters.
Bug Fixes:
Fixed some issues.
Release 1.0