We here benchamrk three single-cell network inference algorithms based on their reproducibility, i.e. their ability to infer similar networks once applied to two independent datasets from the same biological condition.
The benchmarked methods are:
The methods are tested in three biological contexes:
- human retina
- colorectal cancer (CRC) T-cells
- different cell types from hematopoiesis
Please note that the notebook executes by default the comparison in human retina. By uncommenting the lines relative to the biolgical context of interest in the cell corresponding to the data loading the user can however change the starting dataset.
As detailed in the paper, the data used for this benchmark are the following:
- Retina:Menon, M. et al. Lukowski SW. et al.
- CRC T-cells: Zhang et al. Li et al.
- Hematopoiesis:Hay et al. Setty et al.
The preprocessed input data are available at https://cloud.biologie.ens.fr/index.php/s/JuJgrIL1jC6yZh4/download. Details on the preprocessing steps are provided in the methods of the paper.
To access all data:
- Clone or download the scNET repository
- From R terminal or Rstudio, run the following lines
setwd('../scNET/')
dataURL= 'https://cloud.biologie.ens.fr/index.php/s/JuJgrIL1jC6yZh4/download'
download.file(dataURL, 'scNET_data.zip')
unzip('scNET_data.zip')
- In macOS environment, unzipping the data file from the terminal may be more efficient:
cd ~/scNET/
unzip scNET_dat.zip
- Install conda from https://docs.conda.io/en/latest/miniconda.html
- Create conda environment from yml in scNET repository by entering the following line in a terminal
cd scNET
conda env create -f scNET.yml
- Enter the conda environment:
conda activate scNET
. - Launch the notebook with
jupyter-notebook
.
The preprint describing momix is available in BioRxiv https://www.biorxiv.org/content/10.1101/2020.11.10.375923v1
Users can analyze the reproducibility of networks produced by other algorithms using this workflow.
To do so, save two networks inferred with independant datasets into the scNET Results folder.
Networks must be formatted into 3 columns (colum 1: gene1, column 2: gene2, column 3: interaction weight), in .tsv
or tab seperated file format
Then, run notebook section Algorithm reproducibility evaluation
to calculate metrics.