-
Notifications
You must be signed in to change notification settings - Fork 4
Example of results of Individual Analysis for a school case
These results were generated on the Flamingo computing cluster of GR.
For this example, I use the sc5p_v2_hs_PBMC_1k_5 sample from 10XGenomics (found here: https://support.10xgenomics.com/single-cell-vdj/datasets/4.0.0/sc5p_v2_hs_PBMC_1k).
It is Human PBMC sample from a Healthy Donor with:
- Gene Expression,
- BCR,
- TCR,
- ADT (named Feature Barcoding by 10XGenomics).
This sample contains average 1k cells and it was made by v2 chemistry technology.
In sc5p_v2_hs_PBMC_1k_feature_ref.csv file (found here too: https://support.10xgenomics.com/single-cell-vdj/datasets/4.0.0/sc5p_v2_hs_PBMC_1k), we have the list of protein names and sequence of the ADT tag.
sc5p_v2_hs_PBMC_1k_feature_ref.csv
id | name | read | pattern | sequence | feature_type |
---|---|---|---|---|---|
CD3 | CD3_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | CTCATTGTAACTCCT | Antibody Capture |
CD19 | CD19_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | CTGGGCAATTACTCG | Antibody Capture |
CD45RA | CD45RA_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TCAATCCTTCCGCTT | Antibody Capture |
CD4 | CD4_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TGTTCCCGCTCAACT | Antibody Capture |
CD8a | CD8a_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | GCTGCGCTTTCCATT | Antibody Capture |
CD14 | CD14_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TCTCAGACCTCCGTA | Antibody Capture |
CD16 | CD16_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | AAGTTCACTCTTTGC | Antibody Capture |
CD56 | CD56_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TTCGCCGCATTGAGT | Antibody Capture |
CD25 | CD25_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TTTGTCCTGTACGCC | Antibody Capture |
CD45RO | CD45RO_TotaC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | CTCCGAATCATGTTG | Antibody Capture |
PD-1 | PD-1_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | ACAGCGCCGTATTTA | Antibody Capture |
TIGIT | TIGIT_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TTGCTTACCGCCAGA | Antibody Capture |
IgG1 | IgG1_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | GCCGGACGACATTAA | Antibody Capture |
IgG2a | IgG2a_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | CTCCTACCTAAACTG | Antibody Capture |
IgG2b | IgG2b_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | ATATGTATCACGCGA | Antibody Capture |
CD127 | CD127_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | GTGTGTTGTCCTATG | Antibody Capture |
CD15 | CD15_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | TCACCAGTACCTAGT | Antibody Capture |
CD197_(CCR7) | CD197_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | AGTTCAGTCAACCGA | Antibody Capture |
HLA-DR | HLA-DR_TotalC | R2 | ^NNNNNNNNNN(BC)NNNNNNNNN | AATAGCGAGCAAGTA | Antibody Capture |
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT.fa
>CD3
CTCATTGTAACTCCT
>CD19
CTGGGCAATTACTCG
>CD45RA
TCAATCCTTCCGCTT
>CD4
TGTTCCCGCTCAACT
>CD8a
GCTGCGCTTTCCATT
>CD14
TCTCAGACCTCCGTA
>CD16
AAGTTCACTCTTTGC
>CD56
TTCGCCGCATTGAGT
>CD25
TTTGTCCTGTACGCC
>CD45RO
CTCCGAATCATGTTG
>PD-1
ACAGCGCCGTATTTA
>TIGIT
TTGCTTACCGCCAGA
>IgG1
GCCGGACGACATTAA
>IgG2a
CTCCTACCTAAACTG
>IgG2b
ATATGTATCACGCGA
>CD127
GTGTGTTGTCCTATG
>CD15
TCACCAGTACCTAGT
>CCR7
AGTTCAGTCAACCGA
>HLA-DR
AATAGCGAGCAAGTA
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT_tr2gs.txt
transcript gene gene_name
CD3 CD3 CD3
CD19 CD19 CD19
CD45RA CD45RA CD45RA
CD4 CD4 CD4
CD8a CD8a CD8a
CD14 CD14 CD14
CD16 CD16 CD16
CD56 CD56 CD56
CD25 CD25 CD25
CD45RO CD45RO CD45RO
PD-1 PD-1 PD-1
TIGIT TIGIT TIGIT
IgG1 IgG1 IgG1
IgG2a IgG2a IgG2a
IgG2b IgG2b IgG2b
CD127 CD127 CD127
CD15 CD15 CD15
CCR7 CCR7 CCR7
HLA-DR HLA-DR HLA-DR
cd '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/'
kallisto index 'ADT.fa' -i 'ADT.kidx' --kmer-size=15
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/markfile.xlsx
Genes | Signatures |
---|---|
IL7R | Naive_CD4_T |
CCR7 | Naive_CD4_T |
CD14 | CD14_Mono |
LYZ | CD14_Mono |
IL7R | Memory_CD4 |
S100A4 | Memory_CD4 |
MS4A1 | B |
CD8A | CD8_T |
FCGR3A | FCGR3A_Mono |
MS4A7 | FCGR3A_Mono |
GNLY | NK |
NKG7 | NK |
FCER1A | DC |
CST3 | DC |
PPBP | Platelet |
As specified in the Pipeline phylosophy (Pipeline details section of this wiki), single-cell data analysis cannot be done through a one-shot pipeline.
So, the rest of the explanation will be a repetition of several phases:
- Make the Configuration Parameter file
- Launch of the analysis
- Interpretation of Results (and Choice of Next Parameters)
- See Arborescence of all results for understand the arborecence.
- The .rda files are snapshots of data at key points in the analysis.
We will do all the alignments, and evaluate the quality of the droplets.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Params.yaml
Steps: ["Alignment_countTable_GE","Alignment_countTable_ADT","Alignment_annotations_TCR_BCR","Droplets_QC_GE"]
############################################ GE (RNA) ############################################
Alignment_countTable_GE:
### Project
sample.name.ge : ["sc5p_v2_hs_PBMC_1k_5gex"]
input.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/'
output.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/'
sctech : '10xv2'
kindex.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir.kidx'
tr2g.file.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir_tr2gs.txt'
reference.txt: "Ensembl reference transcriptome v99 corresponding to the homo sapiens GRCH38 build"
Droplets_QC_GE:
author.name : "marine aglave"
author.mail : "marine.aglave@gustaveroussy.fr, bigr@gustaveroussy.fr"
############################################ ADT ############################################
Alignment_countTable_ADT:
sample.name.adt : ["sc5p_v2_hs_PBMC_1k_5fb"]
input.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
kindex.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT.kidx"
tr2g.file.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT_tr2gs.txt"
############################################ TCR/BCR ############################################
Alignment_annotations_TCR_BCR:
sample.name.tcr : ["sc5p_v2_hs_PBMC_1k_t"]
input.dir.tcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
sample.name.bcr : ["sc5p_v2_hs_PBMC_1k_b"]
input.dir.bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.tcr_bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
For the traceability of the analysis, I prefer to put the command lines in a script, but it is not mandatory.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
#!/bin/bash
########################################################################
## Single-cell script to launch single-cell pipeline
##
## using: sbatch /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
########################################################################
#SBATCH --job-name=pipeline_sc
#SBATCH --nodes=1
#SBATCH --cpus-per-task=1
#SBATCH --mem=1G
#SBATCH --partition=mediumq
conda activate /mnt/beegfs/userdata/m_aglave/.environnement_conda/single_cell_user
module load singularity
path_to_configfile="/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Params.yaml"
path_to_pipeline="/mnt/beegfs/pipelines/single-cell"
snakemake --profile ${path_to_pipeline}/profiles/slurm -s ${path_to_pipeline}/Snakefile --configfile ${path_to_configfile}
conda deactivate
sbatch /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
#provided by the Alignment_countTable_GE step
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/QC_reads/sc5p_v2_hs_PBMC_1k_5gex_GE_RAW.html
#provided by the Alignment_countTable_ADT step
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5fb_ADT/QC_reads/sc5p_v2_hs_PBMC_1k_5fb_ADT_RAW.html
#provided by the Alignment_annotations_TCR_BCR step
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_t_TCR/QC_reads/sc5p_v2_hs_PBMC_1k_t_TCR_RAW.html
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_b_BCR/QC_reads/sc5p_v2_hs_PBMC_1k_b_BCR_RAW.html
It is a classical way to make controle-quality of reads. We can check the number of reads, the reads lentgh (R1 = Barcodes + UMI; R2 = ARN) and other usual metrics.
If there are duplicated reads, it's totally normal, it's the role of UMI to correct the PCR duplicates.
provided by the Alignment_countTable_GE step
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/KALLISTOBUS/run_info.json
{
"n_targets": 0,
"n_bootstraps": 0,
"n_processed": 79556908,
"n_pseudoaligned": 51554724,
"n_unique": 11755362,
"p_pseudoaligned": 64.8,
"p_unique": 14.8,
"kallisto_version": "0.46.2",
"index_version": 0,
"start_time": "Fri Apr 16 18:56:49 2021",
"call": "kallisto bus -i /mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir.kidx -o /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/KALLISTOBUS -x 10xv2 -t 4 /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5gex_GE_S1_L001_R1_001.fastq.gz /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5gex_GE_S1_L001_R2_001.fastq.gz /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5gex_GE_S1_L002_R1_001.fastq.gz /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5gex_GE_S1_L002_R2_001.fastq.gz"
}
provided by the Alignment_countTable_ADT step
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5fb_ADT/KALLISTOBUS/run_info.json
{
"n_targets": 0,
"n_bootstraps": 0,
"n_processed": 11043775,
"n_pseudoaligned": 10520338,
"n_unique": 10520338,
"p_pseudoaligned": 95.3,
"p_unique": 95.3,
"kallisto_version": "0.46.2",
"index_version": 0,
"start_time": "Fri Apr 16 18:52:01 2021",
"call": "kallisto bus -i /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/INDEX_ADT/ADT.kidx -o /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/sc5p_v2_hs_PBMC_1k_5fb_ADT/KALLISTOBUS -x 10xv2 -t 1 /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5fb_ADT_S1_L001_R1_001.fastq.gz /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5fb_ADT_S1_L001_R2_001.fastq.gz /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5fb_ADT_S1_L002_R1_001.fastq.gz /mnt/beegfs/userdata/m_aglave/pipeline/test_new_data/Results/fastq/sc5p_v2_hs_PBMC_1k_5fb_ADT_S1_L002_R2_001.fastq.gz"
}
- n_processed : number of fragments that have been processed.
- n_pseudoaligned : number of fragments that have been pseudoaligned.
- p_pseudoaligned : percentage of fragments that have been pseudoaligned.
- n_unique : number of fragments that have been pseudoaligned to a unique target sequence.
- p_unique : percentage of fragments that have been pseudoaligned to a unique target sequence.
- kallisto_version : version of kallisto tool used for the alignment.
- call : command line used to run the alignment.
Provided by the Droplets_QC_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/QC_droplets/sc5p_v2_hs_PBMC_1k_5gex_GE_saturation_plot.png
These graphs allow to estimate the complexity of the data (more genes is detected per UMI, more complex is our data) and therefore estimate the saturation of the sequencing.
The first graph is a representation of the number of genes by droplet acording to the number of UMIs by droplet. The droplet with the less of genes and the less of UMI are the empty droplets (containing the ambiant RNA). The red line is the trend curve. If the curve forms a plateau at the top, then all the genes have been discovered.
The second graph is a representation of the density, according to the ratio between genes and umi for each droplet. Generally, we expect the novelty score to be above 0.80. The dark blue curve, representing full droplets, shows a peak in density demonstrating that there is a large set of droplets with high complexity. Low complexity droplets (<0.80) are clearly identified as empty droplets. The empty droplet peak on the right should correspond to droplets where only 1UMI (and therefore only 1 gene) has been detected, granting a ratio of 1.
Note:
More information about the second graph here.
Provided by the Droplets_QC_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/QC_droplets/sc5p_v2_hs_PBMC_1k_5gex_GE_kneeplot.png
It's a representation of the number of UMIs by droplet rank. The top plateau corresponds to the real cells and the bottom plateau corresponds to the empty droplets. We must to see this 2 plateaus for a good identification of empty droplets.
Between this 2 plateaus, there are little cells or heterogeneous empty droplets. The Knee Line and the Inflection Line are the caracteristic lines computed by the emptyDrops() function.
Provided by the Droplets_QC_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/QC_droplets/sc5p_v2_hs_PBMC_1k_5gex_GE_QChist.png
At this step, the empty droplets are removed.
It's a representation of the cells frequencies according to the number of genes (nFeatures_RNA), the number of transcripts (nCount_RNA), the percentage of mitochondrial transcripts (%mito) and the percentage of ribosomal transcripts (%ribo). The red lines are the thresholds choosen. In the title of each histogram there is the threshold and the the number of kept cell if the threshold is applied.
Cells must have less than 20% of mitochondrial RNA for publication, and in general, we use a value within 5% (20%, 15%, 10% or 5%). Here many cells (947 cells) have less than 20% so the cells are not dying. We could set the threshold at 15%, or even 10% (to be really stringent), to keep only the best quality cells. So for the Filtering_GE step, we will choose 15% as the maximum threshold of the percentage of mitochondrial RNA.
For the percentage of ribosomal RNA are rarely filtered.
For the number of genes and the number of transcripts, the filtering depends on data.
We will do the filtering of droplets, and evaluate the identification of doublets.
We keep the same Configuration Parameter file, but we add the Filtering_GE step, so the parameters of the sample.name.ge and input.rda.ge will be determined automatically. We only have to add the special parameters of the Filtering_GE step (and add "Filtering_GE"
into Steps
).
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Params.yaml
Steps: ["Alignment_countTable_GE","Alignment_countTable_ADT","Alignment_annotations_TCR_BCR","Droplets_QC_GE","Filtering_GE"]
############################################ GE (RNA) ############################################
Alignment_countTable_GE:
### Project
sample.name.ge : ["sc5p_v2_hs_PBMC_1k_5gex"]
input.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/'
output.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/'
sctech : '10xv2'
kindex.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir.kidx'
tr2g.file.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir_tr2gs.txt'
reference.txt: "Ensembl reference transcriptome v99 corresponding to the homo sapiens GRCH38 build"
Droplets_QC_GE:
author.name : "marine aglave"
author.mail : "marine.aglave@gustaveroussy.fr, bigr@gustaveroussy.fr"
Filtering_GE:
pcmito.max : 0.15
############################################ ADT ############################################
Alignment_countTable_ADT:
sample.name.adt : ["sc5p_v2_hs_PBMC_1k_5fb"]
input.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
kindex.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT.kidx"
tr2g.file.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT_tr2gs.txt"
############################################ TCR/BCR ############################################
Alignment_annotations_TCR_BCR:
sample.name.tcr : ["sc5p_v2_hs_PBMC_1k_t"]
input.dir.tcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
sample.name.bcr : ["sc5p_v2_hs_PBMC_1k_b"]
input.dir.bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.tcr_bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
No change in /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh script.
sbatch /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
Provided by the Filtering_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSKEPT/sc5p_v2_hs_PBMC_1k_5gex_GE_QChist.png
It is the same histogram as before, but after filtering by the number of genes, the number of transcripts, the percentage of mitochondrial transcripts and the percentage of ribosomal transcripts, and before removing the doublets.
Here there are 907 cells which have less than 15% of mitochondrial RNA.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/sc5p_v2_hs_PBMC_1k_5gex_GE_QChist.png
It is the same histogram as before but after filtering by the number of genes, the number of transcripts, the percentage of mitochondrial transcripts and the percentage of ribosomal transcripts, and after removing the doublets.
Here there are 873 cells which have less than 15% of mitochondrial RNA.
Provided by the Filtering_GE step. Cell cycle is compute at this step, followed by the doublets indentification. To be sure that the identified doublets do not only correspond to cells in the G2M phase of the cell cycle, a full data analysis is performed with default settings.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSKEPT/LogNormalize/pca/pca_15_0.8/sc5p_v2_hs_PBMC_1k_5gex_GE_RNA_pca_uMAP_dim15_res0.8.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSKEPT/LogNormalize/pca/pca_15_0.8/sc5p_v2_hs_PBMC_1k_5gex_GE_RNA_pca_uMAP3d_dim15_res0.8.png
These graphs correspond to the 2D and 3D umaps of the data, and they are not really important at this step of the analysis because it is made with default parameters.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSKEPT/LogNormalize/pca/pca_15_0.8/technical/sc5p_v2_hs_PBMC_1k_5gex_GE_technical_MULTI_ALL_uMAPs.png
This group of graphs corresponds to the plotting of biases on the umap. We can see doublets (compute by the 2 available methods), the attribution of the cell cycle (compute by seurat and cyclone) and the other biases.
Here, we can see that cells in G2M phase are not just doublets and vice versa. So no problem, we can filtering doublets. If there is a correspondance maybe you only need to use one doublets detection method, or not filter them.
The other graphs in this folder (technical) are not really usefull, they correspond to the grouped graphs.
Provided by the Filtering_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/sc5p_v2_hs_PBMC_1k_5gex_GE_stat.txt
Kallisto_Bustools_alignment.Total_reads | 79556908 |
---|---|
Kallisto_Bustools_alignment.Pseudo_aligned_reads | 51554724 |
Kallisto_Bustools_alignment.Pseudo_aligned_reads_percent | 64,8 |
Kallisto_Bustools_alignment.Pseudo_aligned_reads_to_unique_target_sequence | 11755362 |
Kallisto_Bustools_alignment.Pseudo_aligned_reads_to_unique_target_sequence_percent | 14,8 |
Droplet_Quality.captured_droplet | 126489 |
Droplet_Quality.total_number_UMI | 6396083 |
Droplet_Quality.estimated_cells | 997 |
Droplet_Quality.estimated_UMI | 5919852 |
Droplet_Quality.fraction_read_in_cells | 0,925543336 |
Cells_Quality.mito_summary.Median | 0,060621498 |
Cells_Quality.ribo_summary.Median | 0,251040222 |
Cells_Quality.stress_summary.Median | 0,024519549 |
Cells_Quality.filter_cells_genes | 984 |
Cells_Quality.filter_cells_counts | 967 |
Cells_Quality.genes_per_cell_summary.Median | 1952 |
Cells_Quality.UMI_per_cell_summary.Median | 5813 |
After_QC_cells_filtering.estim_cells_G1 | 521 |
After_QC_cells_filtering.estim_cells_G2M | 71 |
After_QC_cells_filtering.estim_cells_S | 315 |
After_QC_cells_filtering.Genes_covered | 17374 |
After_QC_cells_feature_filtering.estim_doublets | 34 |
Final_Cells_Quality.mito_summary.Median | 0,059024654 |
Final_Cells_Quality.ribo_summary.Median | 0,268956044 |
Final_Cells_Quality.stress_summary.Median | 0,025089606 |
Final_Cells_Quality.genes_per_cell_summary.Median | 1957 |
Final_Cells_Quality.UMI_per_cell_summary.Median | 5887 |
Final_Cells_Quality.nb_genes | 17374 |
Final_Cells_Quality.nb_cells | 873 |
This file summarizes some statistics throughout the filters performed (the filters are accumulated progressively in the table).
Alignment | |
---|---|
Kallisto_Bustools_alignment.Total_reads | number of total detected reads |
Kallisto_Bustools_alignment.Pseudo_aligned_reads | number of total pseudo-aligned reads |
Kallisto_Bustools_alignment.Pseudo_aligned_reads_percent | percentage of total pseudo-aligned reads |
Kallisto_Bustools_alignment.Pseudo_aligned_reads_to_unique_target_sequence | number of total pseudo-aligned reads to a unique location on the reference |
Kallisto_Bustools_alignment.Pseudo_aligned_reads_to_unique_target_sequence_percent | percentage of total pseudo-aligned reads to a unique location on the reference |
Raw data | |
---|---|
Droplet_Quality.captured_droplet | number of total detected droplets |
Droplet_Quality.total_number_UMI | number of total UMI (after UMI deduplication) |
After removing empty droplets | |
---|---|
Droplet_Quality.estimated_cells | number of cells after removing empty droplets |
Droplet_Quality.estimated_UMI | number of UMI after removing empty droplets |
Droplet_Quality.fraction_read_in_cells | the number of UMI after removing empty droplets divided by the number of UMI before removing empty droplets |
Cells_Quality.mito_summary.Median | median of mitochondrial RNA after removing empty droplets, before other filters |
Cells_Quality.ribo_summary.Median | median of ribosomal RNA after removing empty droplets, before other filters |
Cells_Quality.stress_summary.Median | median of mecanis stress RNA after removing empty droplets, before other filters |
Cells_Quality.filter_cells_genes | number of cells after removing empty droplets, if filtering on genes |
Cells_Quality.filter_cells_counts | number of cells after removing empty droplets, if filtering on counts |
Cells_Quality.genes_per_cell_summary.Median | median of the number of gene per cell after removing empty droplets, before other filters |
Cells_Quality.UMI_per_cell_summary.Median | median of the number of UMI per cell after removing empty droplets, before other filters |
After removing bad quality cells | |
---|---|
After_QC_cells_filtering.estim_cells_G1 | number of cells in G1 phase, after filtering |
After_QC_cells_filtering.estim_cells_G2M | number of cells in G2M phase, after filtering |
After_QC_cells_filtering.estim_cells_S | number of cells in S phase, after filtering |
After_QC_cells_filtering.Genes_covered | number of genes, after filtering |
After removing rare genes | |
---|---|
After_QC_cells_feature_filtering.estim_doublets | number of doublets, after filtering genes |
After all filtering | |
---|---|
Final_Cells_Quality.mito_summary.Median | median of mitochondrial RNA after all filtering |
Final_Cells_Quality.ribo_summary.Median | median of ribosomal RNA after all filtering |
Final_Cells_Quality.stress_summary.Median | median of mecanic stress RNA after all filtering |
Final_Cells_Quality.genes_per_cell_summary.Median | median of the number of gene per cell after all fitering |
Final_Cells_Quality.UMI_per_cell_summary.Median | median of the number of UMI per cell after all fitering |
Final_Cells_Quality.nb_genes | total number of genes |
Final_Cells_Quality.nb_cells | total number of cells |
At the end of all filters, we have 873 cells and 17 374 genes.
We will do the normalization and the dimension reduction, and evaluate the biais and the clustering.
We keep the same Configuration Parameter file, but we add the Norm_DimRed_Eval_GE step, so the parameters of the sample.name.ge and input.rda.ge will be determined automatically. We only have to add the special parameters of the Norm_DimRed_Eval_GE step (and add "Norm_DimRed_Eval_GE"
into Steps
).
To simplify the explanation, we will focus on an analysis by SCTransform
and pca
, but for an optimal analysis, we should test the normalization by LogNormalize
followed by a dimension reduction by scbfa
, then compare the methods at the umap level (number, shape and coherence of the clusters).
Note:
SCTransform
and pca
correspond to default parameters of Norm_DimRed_Eval_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Params.yaml
Steps: ["Alignment_countTable_GE","Alignment_countTable_ADT","Alignment_annotations_TCR_BCR","Droplets_QC_GE","Filtering_GE","Norm_DimRed_Eval_GE"]
############################################ GE (RNA) ############################################
Alignment_countTable_GE:
### Project
sample.name.ge : ["sc5p_v2_hs_PBMC_1k_5gex"]
input.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/'
output.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/'
sctech : '10xv2'
kindex.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir.kidx'
tr2g.file.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir_tr2gs.txt'
reference.txt: "Ensembl reference transcriptome v99 corresponding to the homo sapiens GRCH38 build"
Droplets_QC_GE:
author.name : "marine aglave"
author.mail : "marine.aglave@gustaveroussy.fr, bigr@gustaveroussy.fr"
Filtering_GE:
pcmito.max : 0.15
Norm_DimRed_Eval_GE:
eval.markers : "GAPDH"
############################################ ADT ############################################
Alignment_countTable_ADT:
sample.name.adt : ["sc5p_v2_hs_PBMC_1k_5fb"]
input.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
kindex.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT.kidx"
tr2g.file.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT_tr2gs.txt"
############################################ TCR/BCR ############################################
Alignment_annotations_TCR_BCR:
sample.name.tcr : ["sc5p_v2_hs_PBMC_1k_t"]
input.dir.tcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
sample.name.bcr : ["sc5p_v2_hs_PBMC_1k_b"]
input.dir.bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.tcr_bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
No change in /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh script.
sbatch /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
Provided by the Norm_DimRed_Eval_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca_dims.bias.cor.png
This graph represent the correlation between potential biases and each dimension, after nomalization and dimension reduction.
In general, the first dimensions are the most correlated with biases.
From experience, biases with more than 0.3 of correlation with one of the dimensions must be corrected, but of course, this is not always the case.
Note that the number of transcripts (nCount_RNA) is related to the percentages of mitochondrial, ribosomal and mechanical stress RNA. Similarly, the number of transcripts is correlated with the number of genes (nFeature_RNA). So the correction of this number of transcripts will influence the correlation of the other biases with the dimensions. So it is better to test several combinations of factors to correct.
Here, we observe a strong correlation of the number of transcripts (nCount_RNA), the percentage of ribosomal RNA and mechanical stress. Also, we have indicated GAPDH as a marker gene (here it is a negative control gene, so it should not vary with size; conversely, a positive control gene should vary).
Therefore, we will repeat this step with a correction of the number of transcripts.
Notes:
- As we do not know if the percentage of ribosomal RNA corresponds to the cellular activity (which could be interesting to preserve), the correction of this bias depends on the project. Often, I propose 2 versions of results: with and without correction of this bias.
- Mechanical stress genes rarely have a real influence on cell separation, even though it is strongly correlated with dimensions because the percentage of stress RNA is often low in the cells (less than 5%).
- If the correlations are all very low (between 0 and 0.1) for all dimensions, the umap will most likely be a large central patatoid. The explanation is that the correction applied has also corrected the biological variations of interest at the same time as the bias. In this case it is better not to correct.
- Sometimes the correction worsens the phenomenon (the correlations increase with the correction) and it is better not to correct. This is due to the interdependence of the biases explained above.
- If you have any doubt about the correction, it is better not to correct. The impact of the biases will be visible in the final results and it will be easier to evaluate whether they should be corrected or not. If so, you can easily go back to this step and correct.
We will do the normalization and the dimension reduction, and evaluate the biais and the clustering.
We keep the same Configuration Parameter file, but we add the correction of the number of transcripts in the Norm_DimRed_Eval_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Params.yaml
Steps: ["Alignment_countTable_GE","Alignment_countTable_ADT","Alignment_annotations_TCR_BCR","Droplets_QC_GE","Filtering_GE","Norm_DimRed_Eval_GE"]
############################################ GE (RNA) ############################################
Alignment_countTable_GE:
### Project
sample.name.ge : ["sc5p_v2_hs_PBMC_1k_5gex"]
input.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/'
output.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/'
sctech : '10xv2'
kindex.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir.kidx'
tr2g.file.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir_tr2gs.txt'
reference.txt: "Ensembl reference transcriptome v99 corresponding to the homo sapiens GRCH38 build"
Droplets_QC_GE:
author.name : "marine aglave"
author.mail : "marine.aglave@gustaveroussy.fr, bigr@gustaveroussy.fr"
Filtering_GE:
pcmito.max : 0.15
Norm_DimRed_Eval_GE:
eval.markers : "GAPDH"
vtr.biases : "nCount_RNA"
############################################ ADT ############################################
Alignment_countTable_ADT:
sample.name.adt : ["sc5p_v2_hs_PBMC_1k_5fb"]
input.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
kindex.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT.kidx"
tr2g.file.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT_tr2gs.txt"
############################################ TCR/BCR ############################################
Alignment_annotations_TCR_BCR:
sample.name.tcr : ["sc5p_v2_hs_PBMC_1k_t"]
input.dir.tcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
sample.name.bcr : ["sc5p_v2_hs_PBMC_1k_b"]
input.dir.bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.tcr_bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
No change in /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh script.
sbatch /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
Provided by the Norm_DimRed_Eval_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform_nCount_RNA/pca/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca_dims.bias.cor.png
Here the correction is not very convincing, the regressed bias seems to be even more correlated than initially (not more strongly (the maximum having moved from dimension 5 to dimension 4), but on much dimensions). No change for mitochondrial or stress RNA, nor for GAPDH or cell cycle genes (cyclone).
Only the correlation with ribosomal RNA shows a decrease at dimension 5, but remains stable for the other dimensions.
In the end, this regression doesn't bring any improvement, so it is better to continue with the uncorrected version (If you have any doubt about the correction, it is better not to correct.).
Provided by the Norm_DimRed_Eval_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/uMAPs/
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca3_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca7_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca9_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca11_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca13_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca15_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca17_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca19_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca21_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca23_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca25_ALLres.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/uMAPs/sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca25_ALLres.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca25_ALLres"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca27_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca29_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca31_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca33_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca35_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca37_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca39_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca41_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca43_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca45_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca47_ALLres.png
sc5p_v2_hs_PBMC_1k_5gex_GE_uMAPs_SCT_pca49_ALLres.png
To identify the number of dimensions to keep for clustering as well as the adequate resolution, the pipeline has drawn all possible umaps according to these 2 parameters. By default, it will test between 5 and 49 dimensions with a step of 2, and all resolutions between 0.1 and 1.2 with a step of 0.1.
We have to look at all the umaps and choose the one that seems to be the most "beautiful": cluster well isolated from each other, cells well grouped within its cluster, in accordance with the expected number of clusters (hence the importance of having an estimate of the number of cell types present in the sample).
It is important to know that choosing a low number of dimensions will not capture all the interesting biological information. And, on the contrary, choosing a high number of dimensions will capture all the interesting biological information but also noise. The threshold between the dimensions that bring information and those that bring noise is difficult to detect. However, at the beginning, the more dimensions we add, the more relevant biological information we add, and the more "beautiful" the umaps will be; until a moment when the more dimensions we continue to add, the more noise we will add and the more the umaps will degrade. So you have to take an umap just before the "degradation".
Note:
To choose the right number of dimensions, I look at the umap with a resolution of 1.2 (high resolution), then I choose 2 or 3 numbers of dimensions to keep (depending on the degradation of the umap), then I look at their lower resolution. The evolution of the number of clusters must increase (with the increase of the resolution) and remain coherent for the same number of dimensions (no cells which constantly change cluster, or clusters which appear then disappear). This verification can also be done by the clustree plot.
The clutree plot is a tree plot to observe the influence of a parameter on the results.
Here we measure the evolution of the membership of cells to a cluster:
- the resolution is fixed and the number of dimensions to keep evolves:
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution
*Provided by the Norm_DimRed_Eval_GE step.*sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.1.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.1.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.1"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.2.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.2.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.2"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.3.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.3.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.3"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.4.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.4.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.4"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.5.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.5.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.5"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.6.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.6.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.6"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.7.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.7.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.7"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.8.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.8.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.8"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.9.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.9.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res0.9"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.0.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.0.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.0"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.1.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.1.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.1"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.2.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/louvain_resolution/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.2.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_res1.2"></p>
- the number of dimensions to keep is fixed and the resolution evolves:
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/
*Provided by the Norm_DimRed_Eval_GE step.*sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca3.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca3.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca3"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca5.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca5.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca5"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca7.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca7.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca7"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca9.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca9.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca9"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca11.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca11.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca11"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca13.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca13.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca13"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca15.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca15.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca15"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca17.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca17.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca17"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca19.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca19.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca19"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca21.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca21.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca21"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca23.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca23.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca23"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca25.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca25.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca25"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca27.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca27.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca27"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca29.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca29.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca29"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca31.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca31.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca31"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca33.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca33.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca33"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca35.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca35.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca35"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca37.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca37.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca37"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca39.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca39.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca39"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca41.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca41.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca41"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca43.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca43.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca43"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca45.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca45.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca45"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca47.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca47.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca47"></p>
sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca49.png
<p align="center"><img src="https://github.com/gustaveroussy/single-cell/blob/master/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/clustree_SCT_pca/dimensions/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca49.png" title="sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca49"></p>
The goal is that the membership of the cells in a cluster remains relatively stable.
Here, the umap are very similar between 19 and 35 dimensions, except for clusters 6 and 7 which sometimes disassociate from each other. From 37 dimensions, the umap become blurred with less different cell types. Some noise is certainly captured. We can choose 35 dimensions, just before degradation. Keeping only 15 dimensions could also be interesting because it seems that clusters 1,2,6 and 10 of the first 15 dimensions (resolution 1.2) correspond to clusters 0 and 2 of the first 35 dimensions (resolution 1.2). So to go from the results of the first 15 dimensions to the first 35 dimensions, either information has been added or noise has been added. Since the version with 35 dimensions shows a real stability, I assume the first one (information).
Notes:
- The best practice is to choose at least 2 resolutions:
- one lower to see main cell types,
- one higher to see cell sub-types.
- If in doubt, it is better to use a resolution that is too high, because clusters that should not have been separated can always be joined (the reverse is not possible, you would have to re-cluster).
Here, we can choose a 0.3 and 1.2 resolutions.
What interests me between these 2 resolutions:
- a part of cluster 6 is attached to cluster 3 in resolution 1.2, while it is part of cluster 3 in resolution 0.3. This is a re-assignment of cell membership (reproducible over several choices of the resolution parameter), and I don't know which version is the right one.
- clusters 1 and 4 in resolution 1.2 are assembled in a single cluster 0 in resolution 0.3. However, given the proximity and the shape of these clusters, I think that they are the same cell type.
With 35 dimensions, we can see a very small group with 2-3 cells at the bottom of the graph. Maybe it might be interesting to test higher resolutions to isolate those few cells in a cluster.
For the rest of the explanation, I will only show the results with 35 dimension and 1.2 in resolution, but the interpretation is transposable on the version of the results with a resolution of 0.3.
We will do the clustering, find the marker genes, make the annotation, add ADT, add TCR, add BCR and convert main results into a cerebro object.
We keep the same Configuration Parameter file, but we add Clust_Markers_Annot_GE, Adding_ADT, Adding_TCR, Adding_BCR and Cerebro steps, with specific parameters for Clust_Markers_Annot_GE and Adding_ADT steps.
Reminder: we start again from the uncorrected version.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Params.yaml
Steps: ["Alignment_countTable_GE","Alignment_countTable_ADT","Alignment_annotations_TCR_BCR","Droplets_QC_GE","Filtering_GE","Norm_DimRed_Eval_GE","Clust_Markers_Annot_GE","Adding_ADT","Adding_TCR","Adding_BCR","Cerebro"]
############################################ GE (RNA) ############################################
Alignment_countTable_GE:
### Project
sample.name.ge : ["sc5p_v2_hs_PBMC_1k_5gex"]
input.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/'
output.dir.ge : '/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/'
sctech : '10xv2'
kindex.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir.kidx'
tr2g.file.ge : '/mnt/beegfs/database/bioinfo/single-cell/INDEX/KB-python_KALLISTO/0.24.4_0.46.2/homo_sapiens/GRCh38/Ensembl/r99/cDNA_LINCs_MIRs/GRCH38_r99_cDNA_linc_mir_tr2gs.txt'
reference.txt: "Ensembl reference transcriptome v99 corresponding to the homo sapiens GRCH38 build"
Droplets_QC_GE:
author.name : "marine aglave"
author.mail : "marine.aglave@gustaveroussy.fr, bigr@gustaveroussy.fr"
Filtering_GE:
pcmito.max : 0.15
Norm_DimRed_Eval_GE:
eval.markers : "GAPDH"
Clust_Markers_Annot_GE:
markfile : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/markfile.xlsx"
keep.dims : 35
keep.res : 1.2
############################################ ADT ############################################
Alignment_countTable_ADT:
sample.name.adt : ["sc5p_v2_hs_PBMC_1k_5fb"]
input.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
kindex.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT.kidx"
tr2g.file.adt : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/INDEX_ADT/ADT_tr2gs.txt"
Adding_ADT:
gene.names: "CD3G,CD19,PTPRC,CD4,CD8A,CD14,FCGR3A,NCAM1,IL2RA,PTPRC,PDCD1,TIGIT,IGHG1,IGHG2,IGHG2,IL7R,FUT4,CCR7,HLA-DRA"
############################################ TCR/BCR ############################################
Alignment_annotations_TCR_BCR:
sample.name.tcr : ["sc5p_v2_hs_PBMC_1k_t"]
input.dir.tcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
sample.name.bcr : ["sc5p_v2_hs_PBMC_1k_b"]
input.dir.bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/fastq_10X/"
output.dir.tcr_bcr : "/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/"
No change in /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh script.
sbatch /mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/launcher.sh
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca_uMAP_dim35_res1.2.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca_uMAP3d_dim35_res1.2.png
These graphs correspond to the 2D and 3D umaps of the data with the assignment of cells to their cluster. It corresponds to the umap chosen in the previous step.
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/technical/sc5p_v2_hs_PBMC_1k_5gex_GE_technical_MULTI_ALL_uMAPs.png
This group of graphs corresponds to the plotting of biases on the umap. The goal of these graphs is to check the correction or not of biases. The cells should not be separated according to biases, but according to the biological processes of interest of the cells.
The very small group with 2-3 cells at the bottom of the graph has a high mitochondrial RNA level, a very low ribosomal RNA level, and a very high stress RNA level. So these are cells belonging to cluster 0, but in such a level of stress that they are separated from other cells of the same cell type.
It can be observed that there is no separation of cells according to the percentage of mitochondrial RNA, as well as for the percentage of stress RNA (except in the small group of cells at the bottom). There is a bias in the ribosomal RNAs (we had indeed observed it in the plot of the correlation between the biases and the dimensions), but it does not seem to be linked to clustering (there are cells with a high rate and the other with a low rate which coexists in the same cluster). We make the same observation for the number of transcripts and for the number of genes.
So the corrections were superfluous.
We can see no doublets and the attribution of the cell cycle is not specific to the clusters.
Notes:
- If there was a correlation between the clusters and a bias, we should go back to the normalization and dimension reduction step to correct the bias (unless the correction worsens the phenomenon, in which case there is no solution, the results must be interpreted with full knowledge of the facts).
- The other graphs in this folder (technical) are not really usefull, they correspond to the grouped graphs.
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/found_markers/sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_upset_all.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/found_markers/sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_upset_top10.png
An upsetplot is a kind of venn diagram where you can display as many groups as you need.
Here you can see the number of marker genes specific to a cluster and the number of marker genes shared between several clusters. The first graph matches all the results and the second graphs matches the 10 marker genes with the highest logFC for each cluster.
So if 2 clusters share all their marker genes, it is a single cell type so the resolution chosen for clutering was too high.
For example, cluster 6 has only 2 specific marker genes, but it shares 5 marker genes with cluster 7 and cluster 8, it shares 4 with 8 only and 1 with 7 only. Of the 21 marker genes identified for cluster 6, only 2 are specific for cluster 6.
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/found_markers/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_pca.35_res.1.2_findmarkers_all.txt
Ten first lines of file: genes = fmark$gene, logFC = fmark[avg_name], p_val = fmark$p_val, adj.P.Val = fmark$p_val_adj, pct.1 = fmark$pct.1, pct.2 = fmark$pct.2, tested_cluster = fmark$cluster, control_cluster = "All", min.pct = min.pct)
genes | logFC | p_val | adj.P.Val | pct.1 | pct.2 | tested_cluster | control_cluster | min.pct |
---|---|---|---|---|---|---|---|---|
RPL30 | 0.916508905160442 | 8.12479701874232e-74 | 1.37959053378245e-69 | 1 | 1 | 0 | All | 0.75 |
RPS14 | 0.872106380074094 | 1.54261075374736e-71 | 2.61935305986301e-67 | 1 | 0.998 | 0 | All | 0.75 |
RPS15A | 0.919787852083529 | 3.12110760111916e-70 | 5.29964070670034e-66 | 1 | 0.998 | 0 | All | 0.75 |
RPL32 | 0.897483269401748 | 8.26611796979938e-70 | 1.40358683127194e-65 | 1 | 1 | 0 | All | 0.75 |
RPS27 | 1.01449364815204 | 1.35926612570412e-69 | 2.3080338814456e-65 | 1 | 0.997 | 0 | All | 0.75 |
RPS3A | 0.915744210277118 | 3.95309967442229e-69 | 6.71236324716905e-65 | 1 | 0.998 | 0 | All | 0.75 |
LEF1 | 1.29790808847987 | 9.39445446762252e-69 | 1.5951783686023e-64 | 0.929 | 0.274 | 0 | All | 0.75 |
TCF7 | 1.41638327606605 | 2.64537156504439e-68 | 4.49184091744538e-64 | 0.967 | 0.427 | 0 | All | 0.75 |
RPL35A | 0.779272397622583 | 3.38257222845832e-66 | 5.74360764392222e-62 | 1 | 1 | 0 | All | 0.75 |
This table lists all the marker genes for each cluster (comparison one cluster against all the others):
- adj.P.Val > 5%,
- log2FC > 0.5 (positif log2FC only),
- pct.1 or pct.2 > 0.75 (pct.x: percentage of cells in the group x expressing the tested gene (example: 0.75 corresponds to 75% of cells); min.pct: threshold used for pct.1 and pct.2).
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/found_markers/sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_heatmap.png
The heatmap represents the expression of the 10 best marker genes in logFC for each cluster for all cells group by cluster (also identified in the upsetplot). The expressions were normalized between the genes (scaled with a mean of 0 and an standard deviation of 1) in order to allow a visual comparison between these genes. The width of the top bar representing the cluster corresponds to the number of cells in the cluster, each column representing a cell and each row representing a gene.
This allows to verify that a marker gene is quite specific for a cluster and is not shared by other clusters. It coroborates the results of the upsetplot.
Here we can confirm that clusters 6 and 7 are very close at the expression level, as are clusters 2 and 3.
For example, close clusters can be the same cell type in 2 different activity states.
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/found_markers/
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster0_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster1_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster2_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster3_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster4_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster5_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster6_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster7_vln.png
sc5p_v2_hs_PBMC_1k_5gex_GE_findmarkers_top10_cluster8_vln.png
The violinplot represents the expression of the 10 best marker genes in logFC for each cluster (also identified in the upsetplot) by cell for all clusters. The expression of the marker gene of each cell is plotted by cluster. The greater the number of cells having the same level of expression, the larger the shape is at the level of this expression.
This allows to verify that a marker gene is quite specific for a cluster and is not shared by other clusters. It coroborates the results of the upsetplot.
Here, we observe that the CCL5 and NKG7 genes, markers of cluster 6 are also strongly expressed in clusters 7 and 8. The B2M genes, marker of cluster 6, is also expressed by all the other clusters.
Thus we can cross the results between the upsetplot, the heatmap and the violoplot in order to confirm the top10 marker genes.
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/markers/sc5p_v2_hs_PBMC_1k_5gex_GE_markers_ALL_uMAPs.png
This is a representation of all genes from the Markfile. It can help to annotate cell types.
The very small group with 2-3 cells at the bottom of the graph seems to express very strongly and specifically the PPBP gene, a marker for Platelet cells. So maybe these are Platelet cells that suffered more from the 10X passage.
Notes:
- The other graphs in this folder (markers) correspond to the grouped graphs by signature or not grouped graphs.
- The tracing of gene expression can also be done via the cerebro application.
Provided by the Clust_Markers_Annot_GE step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/cells_annotation/clustifyr/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_uMAP_CFR_hema_microarray_matrix_clust.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/cells_annotation/clustifyr/sc5p_v2_hs_PBMC_1k_5gex_GE_SCT_uMAP_CFR_hema_microarray_matrix_cells.png
The automatic annotation is done with 2 tools (clusterifyR and singleR) with the references included in these tools. There are 2 types of annotation:
- the one made on the whole cell cluster ("_clust" suffix)
- the one made on each cell ("_cell" suffix)
Each tool assigns a score to their annotation and scores below the threshold (
cfr.minscore
andsr.minscore
parameters) are considered uncertain and annotated as "NA". Warning: the references provided with the tools are not necessarily very relevant (from microarray or bulk RNA-seq) but they are the best available at the moment. So, the results should be interpreted with caution.
Note:
It is better to group the information to annotate the cells: the automatic annotation, the marker genes and the Markfile genes.
Here, I present you 2 examples (one realized on the clusters, the other realized on each cell).
By cross-checking the results with the Markfile we can conclude that:
- cluster 0 corresponds to CD4 T Lymphocytes (Naive?),
- cluster 1 corresponds to Monocytes,
- cluster 2 corresponds to CD4 T Lymphocytes (Naive?),
- cluster 3 corresponds to CD8 T Lymphocytes (Naive?),
- cluster 4 corresponds to Monocytes,
- cluster 5 corresponds to B Lymphocytes,
- cluster 6 corresponds to T Lymphocytes (CD4? CD8? Mature?),
- cluster 7 corresponds to T Lymphocytes (CD4? CD8? Mature?),
- cluster 8 corresponds to NK cells,
- the very small group with 2-3 cells at the bottom of the graph may be Platelet cells,
- the small group with 10 cells at the top of the cluster 4 may be Dendritic Cells.
Provided by the Adding_ADT step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/ADT_results/ADT_dimplot.png
This plot shows the normalized expression of the genes (left) with the normalized expression of the corresponding proteins (right).
Often protein expression is very strong with background noise due to non-specific hybridization of the antibodies. To solve this problem we can modify the cutoff of the legend by quantiles.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/ADT_results/ADT_dimplot_legend_cutoff.png
This plot shows exactly the same things, but with a legend cutoff (default parameters).
For example we can see an improvement of the graph for the CD19 protein which shows less non-specific expression with the legend cutoff, although the default parameter is not sufficient to remove all the background noise. It should probably be increased the ADT.min.cutoff
parameter.
Provided by the Adding_TCR step.
Sometimes there are more than on TCR by cell: https://www.jimmunol.org/content/202/3/637
More information about the annotation of contigs by CellRanger: https://support.10xgenomics.com/single-cell-vdj/software/pipelines/3.1/algorithms/annotation
There are several ways to define a clonotype:
- gene: use the genes comprising the TCR.
- nt: use the nucleotide sequence of the CDR3 region.
- aa: use the amino acid sequence of the CDR3 region.
- gene+nt: use the genes comprising the TCR + the nucleotide sequence of the CDR3 region for T cells. This is the proper definition of clonotype.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/QC_quantif.png
This group of graphs represents the number of productive receptors (functional receptors) for the sample.
Here there are more productive than non-productive receptors (both graphs) and most cells have only one receptor (right graph).
This result is consistent with the usual results.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/quantUniqueContig.png
This group of graphs represents the number of different (unique) clonotypes in the sample.
Here, we can observe that almost all contigs are present in a single copy (they are unique), regardless of the definition of clonotypes chosen. This is a special case, because often there are not only unique clonotypes at the level of the TCRs, even if they are mostly unique.
For clonotypes defined only by their gene, we observe 444 unique contigs out of a total of 451 contigs.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/abundanceContig.png
This group of line graphs represents the number of clonotypes depending on the number of cells where the contig is present. The points of the graph are connected.
Here we have mostly single clonotypes, present in a single cell, which is represented by a dot at over 400 clonotypes with an abundance of 1. We also have some clonotypes present in 2 cells, which is represented by a point to near 0 clonotype with an abundance of 2. These 2 points are connected by a straight line.
In a classical case, these graphs present rather a decreasing exponential, because there are colonotypes present in more than 2 cells, so the axis of "Abundance" can label "10" or more.
For the clonotypes defined only by their gene, if we group these results with those of the quantification of the unique contigs, there are 7 contigs (451-44) present in 2 cells (2 cells maximum according to the abundance).
Another simple example:
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/clhomeo.png
By examining the clonal space, we are effectively looking at the relative space occupied by clones at specific proportions. It's like proportions of proportions.
Here all clonotypes are present in a single copy (unique) so they are present in the same proportion compared to the repertoire.
1 copy / 450 total clonotypes = 0.0022 so Medium.
2 copy / 450 total clonotypes = 0.0044 so Medium.
There are no clonotypes more represented (Hyperexpanded) and others under-represented (Rare).
Another simple example:
From the clonal population, we can count the abundance (the number of cell with this clonotype), compute the proportion of each clonotype and assign them to a category of clonotype (called Clonotype Group) as "Rare", "Small", "Medium", "Large" and "Hyperexpanded".
Then, we count the number of each category and compute their proportions. This categories' proportions are drawn on the graph.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/clprop.png
Like clonal space homeostasis above, clonal proportion acts to place clones into separate bins. The key difference is instead of looking at the relative proportion of the clone to the total, the clonalProportion() function will rank the clones by total number and place them into bins. Example: [1:10] are the top 10 clonotypes in each sample.
Here we find that the [1:10] (10 most represented clonotype) contains 11 cells: 9 with a single clonotype and 1 pair of cells with the same clonotype. For the clonotypes defined only by their gene, we observe that the [1:10] (10 most represented clonotype) contains 17 cells: 3 with a unique clonotype and 7 pairs of cells with the same clonotype. The decomposition into 9+1x2, and 3+7x2 is deduced thanks to the abundance graphs.
Another simple example:
From the clonal population, we can count the abundance (the number of cell with this clonotype), rank clonotype by abundance and group clonotypes by bins of rank (called Clonal Indices).
Then, we sum the abundance of all clonotypes by bin and compute their proportions. This bins' proportions are drawn on the graph.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/Frequency_top_10_umapsc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/Frequency_top11to20_umapsc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/Frequency_umapsc5p_v2_hs_PBMC_1k_5gex.png
The frequency represents the number of cells that contain the clonotype based on its amino acid sequence. The up graph represent the absolute frequency, and the bottom graph represente the frequency by bin (Single: clonotypes present in one cell; Small: clonotypes present from 2 to 5 cells; etc.)
Here, we can confirm that there is one clonotype present in 2 cells and that the other clonotypes are present in only one cell. The localization of clonotypes on the umap confirms the T cell annotation of the clusters. We also observe that not all T cells have an identified TCR.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/lengthContig.png
This graph represents the length distribution of the CDR3 sequences (nt or aa) with combined or separate chains.
The interpretation of this type of graph depends on the biological context.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/cloneType.png
This group of graphs represents several umap with the location of each part of the TCR and the size of the TRA and TRB.
The interpretation of this type of graph depends on the biological context.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/cldiv.png
This graph represents the measures the diversity of clonotypes within the sample. It is provided by 4 metrics: Shannon, inverse Simpson, Chao1, and Abundance-based Coverage Estimator (ACE).
The interpretation of this type of graph depends on the biological context.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Global_analysis/aaProperties.png
This group of graphs represents a list the physicochemical properties of the amino acids that make up the receptors.
- Length: total amino acid count.
- Hydrophobicity: grand average of hydrophobicity.
- Bulk: average bulkiness.
- Aliphatic: normalized aliphatic index.
- Polarity: average polarity.
- Charge: normalized net charge.
- Basic: basic side chain residue content.
- Acidic: acidic side chain residue content.
- Aromatic: aromatic side chain content. More information here
The interpretation of this type of graph depends on the biological context.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/clust_quantContig_sc5p_v2_hs_PBMC_1k_5gex.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/abundanceContig.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/clust_clhomeosc5p_v2_hs_PBMC_1k_5gex.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/clust_clpropsc5p_v2_hs_PBMC_1k_5gex.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/Frequency_top_10_clust0_umapsc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/Frequency_top_10_clust2_umapsc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/Frequency_top_10_clust3_umapsc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/Frequency_top_10_clust6_umapsc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/Frequency_top_10_clust7_umapsc5p_v2_hs_PBMC_1k_5gex.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/clust_clOverlap_sc5p_v2_hs_PBMC_1k_5gex.png
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/overlap_aa_sc5p_v2_hs_PBMC_1k_5gex.txt
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/overlap_gene_nt_sc5p_v2_hs_PBMC_1k_5gex.txt
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/overlap_gene_sc5p_v2_hs_PBMC_1k_5gex.txt
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/overlap_nt_sc5p_v2_hs_PBMC_1k_5gex.txt
example of overlap_gene_sc5p_v2_hs_PBMC_1k_5gex.txt
For more readability the columns tested_clust_list_sequences and ref_clust_list_sequences have been replaced by "...".tested_clust | nb_unique_seq_tested_clust | tested_clust_list_sequences | ref_clust | nb_unique_seq_ref_clust | ref_clust_list_sequences | nb_overlap | overlap_list_sequences |
---|---|---|---|---|---|---|---|
0 | 233 | … | 2 | 76 | … | 0 | NA |
0 | 233 | … | 3 | 83 | … | 1 | NA_TRBV2.TRBJ2-7.None.TRBC2 |
0 | 233 | … | 6 | 32 | … | 0 | NA |
0 | 233 | … | 7 | 27 | … | 2 | TRAV8-3.TRAJ37.TRAC_TRBV20-1.TRBJ1-1.None.TRBC1;NA_TRBV6-2.TRBJ2-2.None.TRBC2 |
2 | 76 | … | 0 | 233 | … | 0 | NA |
2 | 76 | … | 3 | 83 | … | 0 | NA |
2 | 76 | … | 6 | 32 | … | 0 | NA |
2 | 76 | … | 7 | 27 | … | 0 | NA |
3 | 83 | … | 0 | 233 | … | 1 | NA_TRBV2.TRBJ2-7.None.TRBC2 |
3 | 83 | … | 2 | 76 | … | 0 | NA |
3 | 83 | … | 6 | 32 | … | 0 | NA |
3 | 83 | … | 7 | 27 | … | 1 | NA_TRBV4-2.TRBJ2-6.None.TRBC2 |
6 | 32 | … | 0 | 233 | … | 0 | NA |
6 | 32 | … | 2 | 76 | … | 0 | NA |
6 | 32 | … | 3 | 83 | … | 0 | NA |
6 | 32 | … | 7 | 27 | … | 0 | NA |
7 | 27 | … | 0 | 233 | … | 2 | NA_TRBV6-2.TRBJ2-2.None.TRBC2;TRAV8-3.TRAJ37.TRAC_TRBV20-1.TRBJ1-1.None.TRBC1 |
7 | 27 | … | 2 | 76 | … | 0 | NA |
7 | 27 | … | 3 | 83 | … | 1 | NA_TRBV4-2.TRBJ2-6.None.TRBC2 |
7 | 27 | … | 6 | 32 | … | 0 | NA |
The graph represents the coefficient of common clonotypes between 2 clusters (number of common clonotypes scaled to the number of unique clonotypes in the smaller cluster). The tables present the number and the sequence of common clonotypes between the clusters.
According to the graph, for the clonotypes defined only by their gene, 7.7% of clonotypes from cluster 7 are shared by the clusters 0; and there is no shared clonotypes between the clusters 0 and 6. According to the text file with the clonotypes defined only by their gene, 2 clonotypes are shared by the clusters 0 and 7; and there is no shared clonotypes between the clusters 0 and 6.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/clust_cldivsc5p_v2_hs_PBMC_1k_5gex.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/TCR_results/Clusters_analysis/aaProperties_sc5p_v2_hs_PBMC_1k_5gex.png
This group of graphs are similar to that of the globale analysis, but by clusters, so the interpretation is similar too.
Provided by the Adding_BCR step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/BCR_results/
The results provided for the BCRs are the same as for the TCR analysis. So I will not explain again. We observe that all the clonotypes are unique except one. In addition, the clonotypes colocalize well with the cluster of B lymphocytes identified with the annotation. Cluster analysis is not very interesting because we have only one cluster of B lymphocytes in this sample.
Provided by the Cerebro step.
/mnt/beegfs/userdata/m_aglave/pipeline/examples/complete_individual_analysis_example_of_wiki/Results/sc5p_v2_hs_PBMC_1k_5gex_GE/F200_C1000_M0-0.15_R0-1_G5/DOUBLETSFILTER_all/SCTransform/pca/dims35_res1.2/sc5p_v2_hs_PBMC_1k_5gex_GE_SCTransform_pca_35_1.2_ADT_TCR_BCR.crb
Cerebro file can be loaded into CerebroApp R Shiny to exploit the main results.
Resources of the Theory of single cell RNA-seq
v1.3
Pipeline details
Configuration
-
Parameter file
- Steps
- Alignment_countTable_GE
- Droplets_QC_GE
- Filtering_GE
- Norm_DimRed_Eval_GE
- Clust_Markers_Annot_GE
- Cerebro
- Alignment_countTable_ADT
- Adding_ADT
- Alignment_annotations_TCR_BCR
- Adding_TCR
- Adding_BCR
- Int_Norm_DimRed_Eval_GE
- Int_Clust_Markers_Annot_GE
- Int_Adding_ADT
- Int_Adding_TCR
- Int_Adding_BCR
- Grp_Norm_DimRed_Eval_GE
- Grp_Clust_Markers_Annot_GE
- Grp_Adding_ADT
- Grp_Adding_TCR
- Grp_Adding_BCR
- Additional files
Results help
- Arborescence of all results
-
Observations and weird results
- Not a threshold by emptyDrops
- Large and small cells into the same sample
- emptyDrops does't work well
- More than 15% mitochondrial RNA while I filtered them out at 15%
- Impact of empty droplets on umap
- Choose the right number of dimensions
- Be careful with the colors, they are sometimes misleading
- Impact of bias correction on umap
Complete Examples of school cases
Individual analysis :
1 sample (scRNA-seq + ADT + TCR + BCR)
Grouped/Integrated analysis :
2 samples (scRNA-seq + ADT + TCR + BCR)