RNA-sequence re-analysis for the correlation of differential gene expression between fetal and adult brains
I do this project in order to practice and understand deeply how to do genomic data science.
You can see its webpage here
I get the data at the link below to re-analyze. The purpose of this re-analysis is to examine the correlation of differential gene expression between fetal and adult brains, which is evaluated through RNA-sequencing. If it has correlation, then count how many up-regulated and down-regulated genes. All of them are done in R, RStudio.
Moreover, I will use the genomic dataset (already statistically analyzed data) in order to predict and classify some characteristics of samples (gender, age). All of them are done in Python, Google Colab.
-
The article "Developmental regulation of human cortex transcription and its clinical relevance at base resolution":
http://www.ncbi.nlm.nih.gov/pmc/articles/PMC4281298/ -
The article's RNA-seq data:
http://www.ebi.ac.uk/ena/data/view/PRJNA245228 -
The article's phenotype meta-data for the samples:
http://www.ncbi.nlm.nih.gov/bioproject/?term=PRJNA245228
- Download RNA-seq data vs phenotype metadata (checking code book.docx for more details)
- Use The Galaxy Project to do (checking code book.docx for more details):
- FASTQ Quality Control
- Alignment with HISAT2
- Get feature count from "featureCounts" in RNA-seq
- Get tidy data (count table)
- Do exploratory analysis and statistical analysis in R
- Predict and classify characteristics of samples in Python
- Use 10 samples, which means the sample size is too small to infer for the large population and can be biased
- Genomic Data Science Specialization audit courses
(https://www.coursera.org/specializations/genomic-data-science) - https://github.com/friveramariani/GenomicDataScience_FetalAdultBrain.git
- https://github.com/jtleek/datasharing.git
- http://jtleek.com/genstats_site/