Network Evaluation Tools is a Python 2.7 package with corresponding examples for evaluating a network's ability to group a given node set in network proximity. This package was developed as a part of the work done in Huang and Carlin et al. 2018.
- data_import_tools - This module contains functions for helping import network files and gene set files for analysis.
- gene_conversion_tools - This module contains functions for helping convert, filter, and save networks from their raw database form. Used in the Network Processing Jupyter Notebooks.
- miscellaneous_functions - This module contains various functions developed to help with analysis along the way. These functions are not well tested and may contain bugs. These functions were generally used to determine other network performance metrics on network recovery of gene sets.
- network_evaluation_functions - This module contains many of the core functions of the set-based network evaluation algorithm.
- network_propagation - This module contains functions to help with network propagation steps used in the set-based network evaluation algorithm.
Currently, the network_evaluation_tools package requires Python 2.7 - Python 2.7.13. Note that some functions in this package may not work with Python 3.0+. network_evaluation_tools requires:
- Argparse >= 1.1
- NetworkX >= 2.1
- Numpy >= 1.11.0
- Matplotlib >= 1.5.1
- Pandas >= 0.19.0
- Requests >= 2.13.0
- Scipy >= 0.17.0
- Scikit-learn >= 0.17.1
Note:
- In Pandas v0.20.0+, the
.ix
indexer has been deprecated. There may be warning regarding this issue, yet the function still works.
- Clone the repository
- cd to new respository
- Execute following command:
python setup.py install
- If the network needs to be normalized to a particular naming scheme:
A Jupyter Notebook describing how each network was processed from the raw download file in the original paper can be found in theNetwork Processing Notebooks
folder. - There are two ways to perform the network evaluation on a gene set:
The following network analyses can be performed either from a Jupyter Notebook or from the command line (seeNetwork Evaluation Examples
folder). Jupyter notebooks are documented within the notebook and the documentation for the python scripts can be seen using the commandpython [script_name].py -h
.
- Database Citations - An Excel file containing details about all of the networks used in the original paper's analysis and affiliated citations for all of the databases used.
- DisGeNET / Oncogenic Component Gene Sets - Two tab separated files, each line containing a gene set from either DisGeNET or the Oncogenic Component collection. The first column of each file is the name of the gene set followed by the list of genes associated with that given gene set on the same line.
- Network performance (AUPRCs) on DisGeNET / Oncogenic Component Gene Sets - Two csv files containing the raw Z-normalized AUPRC scores (network performance scores) of each network analyzed on each gene set analyzed from DisGeNET or the Oncogenic Component gene set collection.
- Network performance effect sizes on DisGeNET / Oncogenic Component Gene Sets - Two csv files containing the relative performance gain of each network's AUPRC score over the median null AUPRC score for each gene set analyzed from DisGeNET or the Oncogenic Component gene set collection.
Please feel free to post issues/bug reports. Questions can be sent to jkh013@ucsd.edu
See the LICENSE file for license rights and limitations (MIT).