- F1 Scores for overlapping communities on multiple resolutions and standard NMI for hard partitioning only (non-overlapping singe resolution clustering) by are evaluated by xmeasures.
- NMI (Normalized Mutual Information) for overlapping multi-resolution clustering (NMI_max compatible with the standard NMI) is evaluated by GenConvMI. GenConvMI is the extended version of gecmi), paper: Comparing network covers using mutual information by Alcides Viamontes Esquivel, Martin Rosvall.
- NMIs (NMI_max, NMI_lfr, NMI_avg) are evaluated by OvpNMI. OvpNMI is the extended version of onmi, paper: Normalized Mutual Information to evaluate overlapping community finding algorithms by Aaron F. McDaid, Derek Greene, Neil Hurley.
- Standard modularity
Q
, but applicable for overlapping communities. - Conductance
f
applicable for overlapping communities.
-
daoc (former hirecs) also used for modularity and conductance evaluation of overlapping community structure (with results compatible to the respective standard modularity and conductance values). It depends on:
libstdc++.so.6
: version GLIBCXX_3.4.20 (precompiled version for modularity evaluation). To install it on Ubuntu use:sudo apt-get install libstdc++6
or$ sudo add-apt-repository ppa:ubuntu-toolchain-r/test $ sudo apt-get update $ sudo apt-get install libstdc++6
-
python-igraph for Louvain algorithm evaluation by NMIs (because the original implementation does not provide convenient output of the communities to evaluate NMIs):
$ pip install python-igraph
. It depends on:libxml2
(andlibz
on Ubuntu 14), which are installed in Linux Ubuntu executing:
$ sudo apt-get install libxml2-dev
(lib32z1-dev
might be also required)
-
OvpNMI or
gecmi
for the NMI_ovp evaluation depends on:libboost_program_options
, to install execute:$ sudo apt-get install libboost-program-options
. The older version of gecmi compiled under Ubuntu 14 depends onlibboost_program_options.so.1.54.0
, the newer one compiled under Ubuntu 16 depends onlibboost_program_options.so.1.58.0
.libtbb.so.2
, to install execute:sudo aptitude download libtbb2; sudo aptitude install libtbb2
Optional requirements of the mpepool.py load balancer:
- psutil is required for the dynamic jobs balancing to perform the in-RAM computations (
_LIMIT_WORKERS_RAM = True
) and limit memory consumption of the workers.$ sudo pip install psutil
To perform in-memory computations dedicating almost all available RAM (specifying memlimit ~= physical memory), it is recommended to set swappiness to 1 .. 10:
$ sudo sysctl -w vm.swappiness=5
or set it permanently in/etc/sysctl.conf
:vm.swappiness = 5
. - hwloc (includes
lstopo
) is required to identify enumeration type of logical CPUs to perform correct CPU affinity masking. Required only for the automatic affinity masking with cache usage optimization and only if the CPU enumeration type is not specified manually.$ sudo apt-get install -y hwloc
- bottle is required for the minimalistic optional WebUI to monitor executing jobs.
$ sudo pip install bottle
All Python requirements are optional and can be installed from the pyreqs.txt
file:
$ sudo pip install -r pyreqs.txt
hwloc
is a system requirement and can't be installed from thepyreqs.txt
Generation of the synthetic undirected weighted networks with overlaps is performed by the LFR-Benchmark, which is the extended version of the original LFR.
The optional shuffling of the input datasets is performed by the standard shuf
Linux application and applicable for the networks in ncol
format. Networks in other formats or with the present header are shuffled by the shuffleNets()
procedure of the benchmark.py script.
convert.py script is used to perform conversion of the network formats.
remlinks.py script is used to randomly remove specified percent of links from the network, which is useful for robustness evaluation of the clustering algorithms.
Resulting clusterings on multiple resolutions can be merged using resmerge, which also performs the node base synchronization with the ground truth communities on Large real-world networks from SNAP. SNAP datasets provide the ground-truth communities with less nodes than in the input networks, which requires node base synchronization of the resulting clusters for the fair evaluation.
Resources consumption is evaluated using exectime profiler.