This repository contains the source code, installation and use instructions for the methods presented in the paper: Block-Approximated Exponential Random Graphs. Instructions for replicating the experiments in the paper are also given.
We provide two different implementations, a Python version for the MaxEnt model with full structural constraints and a Matlab version for the MaxEnt using low rank approximations.
The Python version of MaxEnt can be installed directly from GitHub with:
$ pip install git+https://github.com/aida-ugent/MaxEntComb.git
It can be also installed from source with:
$ git clone https://github.com/aida-ugent/MaxEntComb.git
$ cd MaxEntComb
$ pip install -e .
Where -e
means "editable" mode.
The MaxEnt Matlab version using low rank approximations requires Matlab R2018b
or higher. The code also uses the open
source minFunc package for unconstrained differentiable
multivariate optimization by M. Schmidt. For convenience, this packages is already included in the /matlab
folder.
Note: The Python MaxEntComb code was extensively tested under Python 2.7.x. We therefore recommend this setting in order to obtain the best possible results. The code also works for other version of Python >=3.6 but it has not been thoroughly tested.
A maxentcomb
CLI command is added on pip install
which runs the model. The command takes the following
parameters:
-h, --help Show this help message and exit.
--inputgraph [INPUTGRAPH] Input graph path.
--output [OUTPUT] Path where the embeddings will be stored. Default is `network.emb`.
--tr_e [TR_E] Path of the input train edges. Default is None (in this case returns embeddings)
--tr_pred [TR_PRED] Path where the train predictions will be stored. Default is `tr_pred.csv`
--te_e [TE_E] Path of the input test edges. Default is None.
--te_pred [TE_PRED] Path where the test predictions will be stored. Default is `te_pred.csv`
--learning_rate LEARNING_RATE
Learning rate. Default is 0.1.
--epochs EPOCHS Training epochs. Default is 100.
--grad_min_norm GRAD_MIN_NORM
Early stop if grad norm is below this value. Default is 0.0001.
--optimizer OPTIMIZER Optimizer to use. Options are `newton` and `grad_desc`. Default is `newton`.
--memory MEMORY The constraints on the memory to use. Options are `quadratic` and `linear`.
Quadratic is considerably faster but used O(n^2) memory. Default is `quadratic`.
--prior PRIOR [PRIOR ...] The prior to use. Options are `CN`, `AA`, `A3`, `JC`, `PA` or list combining them.
Default is [`CN`].
--delimiter DELIMITER The delimiter used in the input edgelist. Output will use the same one. Default is `,`
--dimension DIMENSION Not used. Default is 1.
--verbose Determines the verbosity level of the output. Default is False.
NOTE: The user can define new constraint matrices F. However, these need to implement the interfaces provided
in the weighted_lin_constr.py
file.
Examples of running the Python MaxEnt model are:
# Example 1: Outputs posterior matrix after calculation
maxentcomb --inputgraph ./graph.edgelist --prior 'CN' 'PA' --optimizer 'newton'
# Example 2: Outputs predictions for the edge pairs in the tr_e and te_e files
maxentcomb --inputgraph ./graph.edgelist --tr_e ./tr.edgelist --te_e ./te.edgelist --tr_pred './tr.out' --te_pred './te.out'
The file test.m
provides an overview on how to run a complete pipeline using the Matlab MaxEnt code.
This code generates a symmetric adjacency matrix and computes its spectral decomposition. Second and third order
proximity matrices are then computed, as well as low rank approximations for Resource Allocation Index and Preferential
Attachment. A cartesian product of these block approximation is then obtained and the MaxEnt model is fitted on this
binning using the lbfgs solver from the minFunc package. Finally train and test predictions are computed for the
input edge pairs.
In order to simplify the execution of the Matlab MaxEnt version, we also provide a pre-compiled file named test
as
well as a shell script run_test.sh
for running it. The shell script takes the following parameters in this order:
inputgraph, tr_e, te_e, tr_pred, te_pred, dim, bins, weights. The descriptions of the first 5 parameters are the
same as for the Python version. As for the remaining ones:
dim: FLOAT Embedding dimensionality
bins: FLOAT Number of bins each F matrix is divided in.
weights: LIST The multiplicity for each high-order priximity of A.
E.g. weights[1]*A^2 + weights[2]*A^3 ...
An example of running the Matlab MaxEnt version:
./run_test.sh /usr/local/MATLAB/MATLAB_Runtime/v95 graph.edgelist ./tr.edgelist ./te.edgelist './tr.out' './te.out' 8 100 [1,0.1]
In order to reproduce the experiments presented in the paper the following steps are necessary:
-
Download and install the EvalNE library as instructed here
-
Download and install the MaxEnt method as shown in the Installation section above
-
Download and install the baseline methods. We recommend that each method is installed in a unique virtual environment to ensure that the right dependencies are used. For all methods except SDNE, we use the implementations provided by the original authors:
For SDNE the implementation available here is the only one which has shown similar results to those in the original paper.
-
Download the datasets:
-
Download the
.ini
configuration files from the/experiments
folder. Modify the dataset paths and method paths to match the respective installation directories and run each files as:python -m evalne ./experiments/conf_maxent.ini
NOTE: For AROPE a main.py
file is required in order to run the evaluation through EvalNE. This file
is also available in the /experiments
folder.
NOTE2: To ensure that the same results reported in the paper are obtained, we recommend that both MaxEnt and the EvalNE library are installed in independent virtual environments under Python 2.7.x.
If you have found our research useful, please consider citing our paper, which is also available on arxiv:
@INPROCEEDINGS{9260034,
author={F. {Adriaens} and A. {Mara} and J. {Lijffijt} and T. {De Bie}},
booktitle={2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA)},
title={Block-Approximated Exponential Random Graphs},
year={2020},
pages={70-80},
doi={10.1109/DSAA49011.2020.00019}}