Skip to content

This project (https://doi.org/10.1145/3512452.3512455) investigates and summarizes the superiority and limitations of different dimensionality reduction schemes as well as classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets.

License

Notifications You must be signed in to change notification settings

Stry233/Automated-Cell-Recognition-Using-Single-cell-RNA-sequencing-with-Machine-Learning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Automated Cell Recognition Using Single-cell RNA sequencing with Machine Learning

License Stars Language Use

This project investigates and summarizes the superiority and limitations of different dimensionality reduction schemes as well as classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets.

Table of Contents

Click me to Open/Close the directory listing

Introductions

Background

Although scRNA-seq technology has gained further capability to capture differential information at the cellular level compared to earlier transcriptome analysis methods including bulk RNA-seq, the cross-cellular technical errors arising from its data acquisition phase and other limitations provide challenges for researchers to maintain a balance between data pre-processing and information retention. Based on this, several relatively mature schemes including t-SNE, PCA, and multiple algorithm combinations on data dimension reduction was explored and tested in this report, and evaluated the accuracy obtained by machine-learning-based classifiers for cell classification tasks as a base metric for comprehensive comparison and evaluation.

Pipeline

pipeline

This is the pipeline for large-scale, cell identification task from the beginning of raw data to the final classification. a. Labels + Reads Per Kilobase per Million mapped reads. b. Multiple dimension reduction methods with multiple dimensions applied. c. The specific implementation principle of the PCA + t-SNE combination algorithm. d. Visualization in both 2 & 3 dimensions and both with & without labels. e. Multiple classifiers with multiple parameters applied

Dataset

The reprocessed dataset that supports the conclusion of this paper are publicly available online at https://scquery.cs.cmu.edu/processed_data/.

Graphics

avatar avatar

Contributors

Click me to Open/Close the contributors listing
  • Yuetian Chen - Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY, United States, 12180 (email: cheny63@rpi.edu)
  • Chenqi Xu - Southern University of Science and Technology, Shenzhen, China, 518055
  • Yiyang Cao - The University of British Columbia, Vancouver, BC, Canada, V6T 1Z4

Special Thanks

This research was undertaken as part of the CIS - Introduction to Machine Learning "Our Body" Project. Thanks to Prof. Ziv Bar-Joseph for his guidance and instruction in dataset pre-processing and paper refinement.

License

License MIT

About

This project (https://doi.org/10.1145/3512452.3512455) investigates and summarizes the superiority and limitations of different dimensionality reduction schemes as well as classification methods in specific single-cell RNA sequencing (scRNA-seq) data sets.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published