Skip to content

Latest commit

 

History

History
71 lines (52 loc) · 4.35 KB

File metadata and controls

71 lines (52 loc) · 4.35 KB

Automated Cell Recognition Using Single-cell RNA sequencing with Machine Learning

License Stars Language Use

This project investigates and summarizes the superiority and limitations of different dimensionality reduction schemes as well as classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets.

Table of Contents

Click me to Open/Close the directory listing

Introductions

Background

Although scRNA-seq technology has gained further capability to capture differential information at the cellular level compared to earlier transcriptome analysis methods including bulk RNA-seq, the cross-cellular technical errors arising from its data acquisition phase and other limitations provide challenges for researchers to maintain a balance between data pre-processing and information retention. Based on this, several relatively mature schemes including t-SNE, PCA, and multiple algorithm combinations on data dimension reduction was explored and tested in this report, and evaluated the accuracy obtained by machine-learning-based classifiers for cell classification tasks as a base metric for comprehensive comparison and evaluation.

Pipeline

pipeline

This is the pipeline for large-scale, cell identification task from the beginning of raw data to the final classification. a. Labels + Reads Per Kilobase per Million mapped reads. b. Multiple dimension reduction methods with multiple dimensions applied. c. The specific implementation principle of the PCA + t-SNE combination algorithm. d. Visualization in both 2 & 3 dimensions and both with & without labels. e. Multiple classifiers with multiple parameters applied

Dataset

The reprocessed dataset that supports the conclusion of this paper are publicly available online at https://scquery.cs.cmu.edu/processed_data/.

Graphics

avatar avatar

Contributors

Click me to Open/Close the contributors listing
  • Yuetian Chen - Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, United States, 12180 (email: cheny63@rpi.edu)
  • Chenqi Xu - Southern University of Science and Technology, Shenzhen, China, 518055
  • Yiyang Cao - The University of British Columbia, Vancouver, BC, Canada, V6T 1Z4

Special Thanks

This research was undertaken as part of the CIS - Introduction to Machine Learning "Our Body" Project. Thanks to Prof. Ziv Bar-Joseph for his guidance and instruction in dataset pre-processing and paper refinement.

License

License MIT