Skip to content

Latest commit

 

History

History
79 lines (55 loc) · 5.82 KB

README.md

File metadata and controls

79 lines (55 loc) · 5.82 KB

Utilities

Clustering Quality Measures

Extrinsic quality measures

Intrinsic quality measures (evaluated by DAOC)

  • Standard modularity Q, but applicable for overlapping communities.
  • Conductance f applicable for overlapping communities.

Requirements

  • daoc (former hirecs) also used for modularity and conductance evaluation of overlapping community structure (with results compatible to the respective standard modularity and conductance values). It depends on:

    • libstdc++.so.6: version GLIBCXX_3.4.20 (precompiled version for modularity evaluation). To install it on Ubuntu use: sudo apt-get install libstdc++6 or
      $ sudo add-apt-repository ppa:ubuntu-toolchain-r/test
      $ sudo apt-get update
      $ sudo apt-get install libstdc++6
  • python-igraph for Louvain algorithm evaluation by NMIs (because the original implementation does not provide convenient output of the communities to evaluate NMIs): $ pip install python-igraph. It depends on:

    • libxml2 (and libz on Ubuntu 14), which are installed in Linux Ubuntu executing:
      $ sudo apt-get install libxml2-dev (lib32z1-dev might be also required)
  • OvpNMI or gecmi for the NMI_ovp evaluation depends on:

    • libboost_program_options, to install execute: $ sudo apt-get install libboost-program-options. The older version of gecmi compiled under Ubuntu 14 depends on libboost_program_options.so.1.54.0, the newer one compiled under Ubuntu 16 depends on libboost_program_options.so.1.58.0.
    • libtbb.so.2, to install execute: sudo aptitude download libtbb2; sudo aptitude install libtbb2

Optional requirements of the mpepool.py load balancer:

  • psutil is required for the dynamic jobs balancing to perform the in-RAM computations (_LIMIT_WORKERS_RAM = True) and limit memory consumption of the workers.
    $ sudo pip install psutil

    To perform in-memory computations dedicating almost all available RAM (specifying memlimit ~= physical memory), it is recommended to set swappiness to 1 .. 10: $ sudo sysctl -w vm.swappiness=5 or set it permanently in /etc/sysctl.conf: vm.swappiness = 5.

  • hwloc (includes lstopo) is required to identify enumeration type of logical CPUs to perform correct CPU affinity masking. Required only for the automatic affinity masking with cache usage optimization and only if the CPU enumeration type is not specified manually.
    $ sudo apt-get install -y hwloc
  • bottle is required for the minimalistic optional WebUI to monitor executing jobs.
    $ sudo pip install bottle

All Python requirements are optional and can be installed from the pyreqs.txt file:

$ sudo pip install -r pyreqs.txt

hwloc is a system requirement and can't be installed from the pyreqs.txt

Data Preparation and Post-processing

Synthetic Networks Generation and Shuffling

Generation of the synthetic undirected weighted networks with overlaps is performed by the LFR-Benchmark, which is the extended version of the original LFR.

The optional shuffling of the input datasets is performed by the standard shuf Linux application and applicable for the networks in ncol format. Networks in other formats or with the present header are shuffled by the shuffleNets() procedure of the benchmark.py script.

Network Format Conversion

convert.py script is used to perform conversion of the network formats.

Network Perturbation

remlinks.py script is used to randomly remove specified percent of links from the network, which is useful for robustness evaluation of the clustering algorithms.

Clusters Post-processing

Resulting clusterings on multiple resolutions can be merged using resmerge, which also performs the node base synchronization with the ground truth communities on Large real-world networks from SNAP. SNAP datasets provide the ground-truth communities with less nodes than in the input networks, which requires node base synchronization of the resulting clusters for the fair evaluation.

Resource Consumption Tracing Tools

Resources consumption is evaluated using exectime profiler.