Skip to content

UCB-stat-159-s22/hw07-Group14

Repository files navigation

Binder

Single Cell Sequencing Analysis Project

Jennefer, Claudea; Kim, Wendy; Tsai, Gordon; Villouta, Catalina

Project Scope

In this project we work directly with public available single cell RNA-seq data with the aim of classifying Mus musculus (house mouse) cells to the appropiate organ they came from. Given limited computational resources we decided to work with cells from kidney and liver only, however the work presented here is generalizable to as many organs as needed.

Project Goals

  • Implement an autoencoder to find a low-dimensional latent representation of the cells.
  • Show that the latent representation is more useful than PCA.
  • Implement a model built on top of the encoder for classifying cells into kidney or liver.
  • Obtain a high performance for the classifier out-of-sample.

Dependencies

All dependencies are listed in environment.yml and book-requirements.txt

Dataset

This project's relevant datasets span cell information from kidney and liver organs. We obtained them from the Tabula Muri's project, a compendium of single-cell transcriptome data from the Mus musculus organism (specific links are shown in the references). The data used in this project can be found in the data folder:

  • Kidney-counts.csv
  • Liver-counts.csv

Setup

  • Create Virtual Environment:

    • using environment.yml in terminal
      • mamba env create -f environment.yml -p ~/envs/genes
      • python -m ipykernel install --user --name genes --display-name "IPython - genes"
    • using Makefile in terminal: make env
  • Activate Virtual Environment:

    • conda activate genes
  • Install genetools package from source, via:

    • pip install .
  • Run analysis using Makefile in terminal:

    • make all

Run tests

  • Use the command pytest genetools on the terminal to run all the tests.

License

This project is released under the terms of the BSD 3-clause License.

Reference

We obtained the data from the Tabula Muris project released in 2017 by The Chan Zuckerberg Biohub. All matrices of gene-cell counts and metadata are available as CSVs on Figshare. We specifically used the data for kidney and liver cells from the FACS-based full-length transcript analysis released in 2018.

  • Consortium, Tabula Muris; Webber, James; Batson, Joshua; Pisco, Angela (2018): Single-cell RNA-seq data from Smart-seq2 sequencing of FACS sorted cells (v2). figshare. Dataset. DOI