Evaluation library for BIONIC. This library contains code to reproduce the co-annotation prediction, module detection, and gene function prediction evaluations from Fig. 2a, 3a, 4 and 5.
NOTE: The module detection and gene function prediction evaluations take a considerable amount of time to complete (on the order of hours). You can speed them up by reducing the size of the parameter search space, reducing the number of trials, or using more CPUs.
The library can be installed using Poetry.
-
First, install Poetry.
-
Create a virtual Python 3.8 environment using conda:
$ conda create -n bionic-evals python=3.8
-
Make sure your virutal environment is active for the following steps:
$ conda activate bionic-evals
-
Clone this repository by running
$ git clone https://github.com/duncster94/BIONIC-evals.git
-
Make sure you are in the same directory as the
pyproject.toml
file. Install thebioniceval
library as follows:$ poetry install
-
Test
bioniceval
is installed properly by running$ bioniceval --help
You should see a help message.
You can run bioniceval
by simply passing in a config file as follows:
$ bioniceval path/to/config/file.json
bioniceval
runs by passing in a configuration file: a JSON file containing all the relevant file paths and evaluation parameters. You can have a uniquely named config file for each evaluation scenario you want to run. An example config file can be found here.
The configuration keys are as follows:
Argument | Description |
---|---|
Input files | |
networks.name |
Name for the given network. |
networks.path |
Filepath to input network. |
networks.delimiter |
Delimiter of network file. |
features.name |
Name for the given feature set. |
features.path |
Filepath to input feature set. |
features.delimiter |
Delimiter of feature file. |
Evaluation standards | |
standards.name |
Name for the given standard. |
standards.task |
The type of evaluation task. Valid values are "coannotation" , "module_detection" , and "function_prediction" |
standards.path |
Filepath to standard. |
standards.delimiter |
Delimiter of standard file. |
Module detection specific parameters | |
standards.samples |
Number of flat module set samples to perform evaluations for. |
standards.methods |
A list of valid linkage methods to perform clustering for. See here for more information. |
standards.metrics |
A list of valid distance metrics to perform clustering for. See here for more information. |
standards.thresholds |
Number of clustering thresholds to extract clusters for and evaluate. |
Function prediction specific parameters | |
standards.test_size |
Held-out test size. A value of 0.1 corresponds to test set of 10% of genes. |
standards.folds |
Number of folds to perform cross validation on. |
standards.trials |
Number of trials to repeat function prediction evaluations for. |
standards.gamma.minimum |
Lower bound of radial basis function kernel coefficient. |
standards.gamma.maximum |
Upper bound of radial basis function kernel coefficient. |
standards.gamma.samples |
Number of coefficients to sample from the range defined by minimum and maximum arguments. |
standards.regularization.minimum |
Lower bound of regularization parameter (C in scikit-learn SVC). |
standards.regularization.maximum |
Upper bound of regularization parameter. |
standards.regularization.samples |
Number of regularization parameters to sample from the range defined by minimum and maximum arguments. |
Miscellaneous | |
consolidation |
Whether to consolidate differences in gene sets between datasets by extending datasets to the union of genes ("union" ) or reducing datasets to the intersection of genes ("intersection" ). union was used for analyses in the BIONIC manuscript. |