Concordance analysis of atrophy in MS an regional gene expression
These are parts of scirpts contributing to my Master thesis research project.
To prepare the data, first run the two jupyer notebooks. This requires the gene expression data from the Allen Institute, available here: http://human.brain-map.org/static/download
The other bit required are updated (or improved) MNI coordinates. E.g., the ones provided by Chris Gorgolewski and available here: https://github.com/chrisgorgo/alleninf/tree/master/alleninf/data
-
AIBS_sample_preprocessing.ipynb This script selects the eligbile samples for the analysis.
-
AIBS_probe_selection.ipynb This script selects the eligible gene expression probes for the analysis.
-
Coordinates to values.ipynb This script obtains the EBM stage for a given MNI coordinate.
-
Regional_correlation.ipynb This script adds regional labels and EBM Stages to each sample ID. This script also contains the correlation part for temporal atrophic progression in MS
-
Genetic_correlation_plotting.ipynb This script plot out the figures for the correlation between the MS progression with top ten most correlated genes in positive and negative lists.
After finishing the correlation part, the most correlated genes with the progression can be obtained (according to the r values). Then, these genes can be put into GO for overrepresentation analysis. In this project, we used an online Go tool which can be accessed through: http://cbl-gorilla.cs.technion.ac.il/
GO results can be downloaded as .xls files from the online GO tool. As jupyter lab does not recognise .xls files, we have to transformed them into .xlsx files before analysis. After that, we can analyse the results according to the order of the corresponding r value of each gene and the enrichment levels.
Rank.ipynb This script matches r values to all genes in order.
As we also would like to know the correlation between the potential brain cell type and MS temporal progression, we combined the genetic expression data with the cell fraction data (Menden et al. 2020).
Cell_type_plotting.ipynb This script applied data in celltype_deconvolution.txt to calculate the expression extent of various cell types by the EBM stage.
########################################################
The script for removing skull, CSF etc. and image parcellation: 'instruction_for_parcellation.txt'. Please install FSLeye software before typing the codes into terminal.
Two Supplementary datasets can be found in the 'data' folder. Supplementary dataset 1 includes all significantly correlated genes; Supplementary dataset 2 includes all significantly enriched GO terms.