The aim of this project is to study differential genes expression of 19 sportsmans during physical and psychological stress before and after running in extreme highlands conditions (2450-3450 m, Elbrus m.) and also in "start" point before arrival at competition ( St. Petersburg).
- Processing and evaluating the quality of raw reads
- Alignment to the human reference genome ()
- Count gene and isoform expression level
- Analysis of differential gene expression
- Functional analysis of differentially expressing genes
- Сluster analysis
- For some samples we had several pairs of reads, so this files were merged with merger.sh script.
- Quality of raw reads was checked using FastQC (v0.11.9).
- For processing alignment we used STAR (v2.7) and GENCODE reference genome Release 36 (GRCh38.p13) primary assembly.
We run following command to generate genome indexes:
STAR --runThreadN 8 \
--runMode genomeGenerate \
--genomeDir /path/to/genomeDir \
--genomeFastaFiles /path/to/genome/fasta \
--sjdbGTFfile /path/to/annotations.gtf \
--sjdbOverhang 99
Then we run following command to process alignment:
STAR --genomeDir /path/to/genomeDir \
--sjdbGTFfile /path/to/annotations.gtf \
--readFilesCommand zcat \
--readFilesIn /path/to/read_R1.fastq.gz /path/to/read_R2.fastq.gz \
--outSAMtype BAM SortedByCoordinate \
--limitBAMsortRAM 16000000000 \
--outSAMunmapped Within \
--outFilterMultimapNmax 1 \
--quantMode TranscriptomeSAM \
--runThreadN 8 \
--outFileNamePrefix "/path/to/out/file."
- The most interesting file from previous step was file.Aligned.toTranscriptome.out.bam because we could count gene and isoform expression using it. We also used RSEM (v1.3.3) to perform this analysis:
rsem-calculate-expression --paired-end \
--bam \
--no-bam-output \
-p 8 \
/path/to/file.Aligned.toTranscriptome.out.bam \
/path/to/genome/index out_file_prefix
- Analysis of differential expressing genes was performed in DESeq2 (v1.30.0) R package. We had two R-scripts: for gene analysis and for isoform analysis.
- Lists of differential expressing genes and isoforms were analysed using MSigDB and GeneQuery.
- Cluster analysis was processed in Phantasus.
We compared three types of conditions in pairs and lists of differentially expressing genes were obtained. More information about functional analysis of genes lists may be found here. Next is planned a time series analysis and donor effect correction and also we are going to find out more about gene sets responsible for neurodegenerative diseases.