Material and source code for the Neuro_Droso_ND75-KD manuscript
All .fastq files were deposited on ArrayExpress under the accession number E-MTAB-XXXXX. It comprises two 10x libraries (Pan_neuro_control and Pan_neuro_ND75KD).
Here we follow standard protocols.
cellranger count --id Pan_neuro_control --fastqs=./fastq --sample=Pan_neuro_control --transcriptome=${10x_genome} --expect-cells=10000 --chemistry=auto --include-introns=true
cellranger count --id Pan_neuro_ND75KD --fastqs=./fastq --sample=Pan_neuro_ND75KD --transcriptome=${10x_genome} --expect-cells=10000 --chemistry=auto --include-introns=true
Note: In the manuscript, we used the dm6 genome assembly.
1.0. Generating individual Seurat object for each library (see full code here: Pan_neuro_control.[Rmd] and Pan_neuro_ND75KD.[Rmd])
Here we load the .h5 files generated by cellranger in the previous step, filter the bad quality cells, and run standard Seurat v4.4.0 pipeline.
Rscript -e "rmarkdown::render('./preprocessing/S0a_Pan_neuro_control.Rmd', output_file = './preprocessing/S0a_Pan_neuro_control.html')"
Rscript -e "rmarkdown::render('./preprocessing/S0b_Pan_neuro_ND75KD.Rmd', output_file = './preprocessing/S0b_Pan_neuro_ND75KD.html')"
1.1. Integration of transcriptomics data with Harmony (see full code here: [Rmd])
Here we load the two previously generated Seurat objects, and we integrate them using Harmony. We both use a lenient integration method (with early_stop = T
) and a "over-corrected" integration (with early_stop = F
and max_iter = 100
) to see where the new ND75KD-specific clusters are merged with. Then we run the standard Seurat v4.4.0 pipeline to generate UMAP, t-SNE and clustering metadata on the integrated Seurat object.
Rscript -e "rmarkdown::render('./preprocessing/S1_Pan_neuro_integrated.Rmd', output_file = './preprocessing/S1_Pan_neuro_integrated.html')"
Now that we generated the integrated Seurat object, we will add the following metadata:
- a) An Oxphos score generated using AUCell from the pySCENIC pipeline (see in 1.3 how we installed/used pySCENIC)
- b) A manual annotation of the clusters
The goal of this step is to create of score of enrichment of OXPHOS genes in each cell to see which cells are enriched for OXPHOS genes. In this section, we will do this by using the AUCell tool in the pySCENIC pipeline. The first step is to generate a .loom file that will be read by the pySCENIC pipeline. So we use this SX1_Create_Loom_From_Seurat_Object.Rmd] R script which transform our previous Seurat object (Pan_neuro_integrated.rds) into a LOOM file (Pan_neuro_integrated.loom).
Then we can run the S2a_Pan_neuro_integrated_Oxphos_scoring_AUCell.ipynb] Jupyter Notebook script which reads both the Pan_neuro_integrated.loom
LOOM file and the OXPHOS gene list (using the 117 OXPHOS genes from Flybase GO-BP GO:0006119 : oxidative phosphorylation stored in the Oxphos_genes.xlsx Excel file). This generates a Pan_neuro_integrated_117_Oxphos_AUCell_auc.tsv cell/score mapping tsv file, which we will use later to incorporate the metadata in our Seurat object.
We generated a marker gene table for each of the generated clusters that we handed to our expert for manual annotation of the clusters. We also leverage an integration with the Fly Cell Atlas Head dataset to transfer annotation and help with the curation of all clusters (data not shown). This tedious task generated the Pan_neuro_integrated_markers_annotation.txt file, containing an annotation for each cluster. We propagated this annotation to each cell in the next step.
We do this after all metadata were generated, in step 1.4.
Now, we will run the pySCENIC pipeline in Python, using the aertslab/pyscenic:0.12.1 Docker image available on the DockerHub.
Of note: We simply had to add the ipykernel
package in the Docker image to create a Jupyter Notebook kernel (created in /usr/local/share/jupyter/kernels/pySCENIC/kernel.json) that we could then load in Jupyter Notebook.
1.3.a Running pySCENIC in Python (see full code here: [ipynb])
Running pySCENIC without any error was far from being an easy journey. We'd like to point an eventual reader to this GitHub issue that we've created for this purpose: aertslab/pySCENIC/issues/534. It explains the issues we faced and how we resolved them. This may help to understand some filtering/mapping that we do in the ipynb file, which are not from the default pySCENIC tutorial.
It is worth noting that we saved all intermediary files in .tsv files (available in ./data). We also applied the binarization method for each regulon, to get a binary output (instead of a continuous regulon score) of activated vs. non-activated regulon. Both regulon outputs are then stored in Pan_neuro_integrated_regulons_aucell.tsv and Pan_neuro_integrated_regulons_aucell_binarized.tsv to be further imported into the Seurat object.
We do this after all metadata were generated, in step 1.4.
1.4 Adding all metadata to the final Seurat object (see full code here: [Rmd])
In this step we add the metadata generated in steps 1.2.a, 1.2.b, and 1.3 to the final Seurat object.
This step finishes the preprocessing of the Seurat object, and generates a .rds file (also available to download on ArrayExpress). Next, we use this SX1_Create_Loom_From_Seurat_Object.Rmd R script to transform the Seurat object into a Loom file, and upload it to our ASAP interactive portal asap.epfl.ch. From this portal, you can visualize the datasets and download it as h5ad or LOOM file.
[ASAP final link to be provided once published]
All manuscript figures are reproducible through scripts deposited here
These figures are using the .rds final output file generated at step 1.4 and also available for downloading on ArrayExpress.