From 82c5d1e808cd7a50e0b6781ed2d8699f3e731ff8 Mon Sep 17 00:00:00 2001 From: Lilly Date: Sat, 9 Mar 2024 16:21:55 +0100 Subject: [PATCH] Finished cleaning up integration pipeline.yml --- docs/yaml_docs/pipeline_integration_yml.md | 95 +++++---- .../pipeline_integration/pipeline.yml | 197 +++++------------- 2 files changed, 103 insertions(+), 189 deletions(-) diff --git a/docs/yaml_docs/pipeline_integration_yml.md b/docs/yaml_docs/pipeline_integration_yml.md index ec0d7c59..63cc9466 100644 --- a/docs/yaml_docs/pipeline_integration_yml.md +++ b/docs/yaml_docs/pipeline_integration_yml.md @@ -219,15 +219,16 @@ Parameters to compute the connectivity graph on Protein Defines if you want the batch correction to run. If set to `False`, `PCA` with default parameters is calculated. - dimred `String`, Default: PCA
- Defines if which dimensionality reduction to use, PCA or LSI + Defines which dimensionality reduction to use. Available options are PCA and LSI. - tools `String` (comma-separated), Default: harmony
- Defines the method used to run batch correction, multiple can be selected. - choices: harmony, bbknn + Defines the method used to run batch correction. + Multiple can be selected by specifying them as a comma-seprated string without spaces. + Available options are: harmony, bbknn, and combat - column `String` (comma-separated), Default: sample_id
- - The column you want to batch correct on, if a comma-separated list is specified then all will be used simultaneously + The column you want to batch correct on. + If a comma-separated list is provided then all will be used simultaneously. #### Harmony arguments @@ -241,23 +242,21 @@ Parameters to compute the connectivity graph on Protein For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html) -### BBKNN arguments - - +#### BBKNN arguments - bbknn: - neighbors_within_batch: `Integer`, Default: 3
-For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/) +For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/). -### Find neighbour parameters +#### Find neighbour parameters - neighbors: `String`
- npcs `Integer`, Default: 30
- Number of principal components to calculate for neighbors and Umap + Number of principal components to calculate for neighbors and UMAP. - k `Integer`, Default: 30
Number of neighbors @@ -273,21 +272,22 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re multimodal: - run `Boolean`, Default: True
- Leave False if you don't want to run multimodal integration + Set to False if you don't want to run multimodal integration - tools `String`(Comma separated), Default: "WNN"
Method you want to use to run batch correction. Options include: WNN, totalvi and multiVI. You can specify mutiple methods and they will be run simultaneously. - column_categorical `String`(Comma separated), Default: sample_id
- This is the column you want to run a batch correction on, multiple can be selected simultaneously. + This is the column you want to run a batch correction on. + Mltiple columns can be selected simultaneously by providing them as a comma-separated string without spaces. Extra parameters: ### TotalVI arguments - **totalvi has to run on both rna and protein data** + **TotalVI has to run on both rna and protein data** - These are the basic totalvi parameters required, you can add more if it fits your analysis better. + This is the minimal set of TotalVI parameters required, you can add more if it fits your analysis better. - totalvi: @@ -296,8 +296,7 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re - exclude_mt_genes `Boolean`, Default: True
- mt_column `String`, Default: mt
- filter_by_hvg `Boolean`, Default: True
- - To filter manually create a column called prot_outliers in mdata['prot'] + To filter manually create a column called prot_outliers in mdata['prot'] - filter_prot_outliers `Boolean`, Default: False
- model_args: @@ -313,9 +312,10 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re **MultiVI has to run on both rna and atac data** - These are the basic multivi parameters required, you can add more if it fits your analysis better. + This is the minimal set of MultiVI parameters required, you can add more if it fits your analysis better. - By setting lowmen to True it will subset the atac to the top 25k HVF which is recommended to deal with the concatenation of atac and rna on large datasets which at the moment is required by `scvi-tools`. Note that >100GB of RAM are required to concatenate atac,rna with 15k cells and 120k total features (union rna,atac) + Setting `lowmem` to True it will subset the ATAC data to the top 25k HVF which is recommended to deal with the concatenation of atac and rna on large datasets which at the moment is required by `scvi-tools`. + Note that >100GB of RAM are required to concatenate ATAC and RNA data with 15k cells and 120k total features (union rna,atac) - MultiVI: @@ -331,30 +331,29 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re - max_epochs `Integer`, Default: 500
- lr `Float`, Default: 0.0001
- use_gpu `String`, Default: None
- Leave blank for default str, int and bool. + Leave blank for default str, int and bool. - train_size `Float`, Default: 0.9
- validation_size `String`, Default: None
- Leave blank for default + Leave blank for default - batch_size `Integer`, Default: 128
- weight_decay `Float`, Default: 0.001
- eps `Float`, Default: 1e-08
- early_stopping `Boolean`, Default: True
- save_best `Boolean`, Default: True
- check_val_every_n_epoch `String`, Default: None
- Leave blank for the default integer + Leave blank for the default integer - n_steps_kl_warmup `String`, Default: None
- Leave blank for the default integer + Leave blank for the default integer - n_epochs_kl_warmup `Integer`, Default: 50
- adversarial_mixing `Boolean`, Default: True
- training_plan `String`, Default: None
-### Mofa +### Mofa arguments **Requires at least two modalities, can run with three** - These are the basic mofa parameters required, you can add more if it fits your analysis better. - + This is the minimal set of Mofa parameters required, you can add more if it fits your analysis better. - mofa: - modalities `String` (Comma separated), Default: rna,prot,atac
@@ -362,21 +361,23 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re - n_factors `Integer`, Default: 10
- n_iterations `Integer`, Default: 1000
- convergence_mode `String`, Default: fast
- Choice between fast, medium, and slow + Choice between fast, medium, and slow - save_parameters `Boolean`, Default: False
- outfile `String`, Default: `path/to/h5ad/to_save_model_to`
-### WNN +### WNN arguments **Requires at least two modalities, can run with three** - These are the basic WNN parameters required, you can add more if it fits your analysis better. + This is the minimal set of WNN parameters required, you can add more if it fits your analysis better. + Panpipes uses muon's implementation of WNN. -- WNN: +- WNN: + - modalities `String` (Comma separated), Default: rna, prot, atac
- batch_corrected `String`, Default: None
- Set the modality to one method ("bbknn", "scVI", "harmony", "scanorama"), if left None, a default de novo calculation of neighbours on non-corrected data for that modality using specified parameters + Set the modality to one method ("bbknn", "scVI", "harmony", "scanorama"), if left None, a default de novo calculation of neighbours on non-corrected data for that modality using specified parameters - rna `String`, Default: None
Options here include "bbknn" and "harmony" @@ -391,7 +392,7 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re - atac `String`, Default: *atac_neighbors
- n_neighbors `String`, Default: "leave blank"
- Leave blank to arithmetic mean across modalities neighbors + Leave blank to arithmetic mean across modalities neighbors - n_bandwidth_neighbors `Integer`, Default: 20
@@ -401,11 +402,13 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re - low_memory `Boolean`, Default: True
- +### KNN calculation for multimodal analysis - neighbors: - npcs `Integer`, Default: 30
- - The number of principal components to calculate for neighbors and umap. If no correction is applied PCA will be calculated and used to run the UMAP. If harmony is chosen it will use the following components to create a corrected dimensionality reduction + The number of principal components to calculate for neighbors and UMAP. + If no correction is applied PCA will be calculated and used to run the UMAP. + If harmony is chosen it will use the following components to create a corrected dimensionality reduction. + - k `Integer`, Default: 30
- metric `String`, Default: euclidean
Options include euclidean and cosine @@ -414,27 +417,31 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re Options include scanpy and hnsw -### Plot +## Plotting parameters -- plotqc: -Grouping must be a categorical variable +- plotqc:
- grouping_var `String`, Default: sample_id
+ Column name(s) of the covariate(s) you want to group the plot on. Must be a categorical variable. + Must be provided as a comma-separated String, without spaces. + +Specify other metrics you want to plot on each modalities embedding. One plot per group will be created. +Use the notation mod:variable notation. +These can be categorical or numeric variables. +Any metrics you may want to plot on all modality UMAPs should be listed under `all`. - all `String`, Default: rep:receptor_subtype
- - Any metrics you may want to plot on all modality umaps should be listed under all the modalities - - rna `String`, Default: rna:total_counts
- prot `String`, Default: prot:total_counts
- atac `String`, Default: atac:total_counts
- multimodal `String`, Default: rna:total_counts
+If you want to add any additional plots, simply remove the log file (logs/plot_batch_corrected_umaps.log) and run `panpipes integration make plot_umaps`. - -### Make final object +## Creating the final object Leave this final option blank until you have reviewed the results from running `papipes integration make full`. -This step will produce a mudata object with one layer and one correction per modality, and one multimodal layer. For unimodal integration select the uncorrected version and use "no_correction". +This step will produce a mudata object with one layer and one correction per modality, and one multimodal layer. +For unimodal integration select the uncorrected version and use "no_correction". **Then run**`panpipes integration make merge_integration` diff --git a/panpipes/panpipes/pipeline_integration/pipeline.yml b/panpipes/panpipes/pipeline_integration/pipeline.yml index bde60087..a3cea274 100644 --- a/panpipes/panpipes/pipeline_integration/pipeline.yml +++ b/panpipes/panpipes/pipeline_integration/pipeline.yml @@ -107,80 +107,46 @@ prot: # ------------- # ATAC modality atac: - # True or false depending on whether you want to run batch correction run: False - # which dimensionality reduction to expect, LSI or PCA - dimred: PCA - # what method(s) to use to run batch correction, you can specify multiple - # (comma-seprated string, no spaces) - # choices: harmony,bbknn,combat - tools: - # this is the column you want to batch correct on. if you specify a comma separated list, - # they will be all used simultaneosly. if you want to test correction for one at a time, - # specify one at a time and run the pipeline in different folders i.e. integration_by_sample, - # integration_by_tissue ... + dimred: PCA + tools: column: sample_id - #---------------------------- + # Harmony args - #----------------------------- harmony: - # sigma value, used by Harmony - sigma: 0.1 - # theta value used by Harmony, default is 1 + sigma: 0.1 theta: 1.0 - # number of pcs, used by Harmony npcs: 30 - #---------------------------- + # BBKNN args # https://bbknn.readthedocs.io/en/latest/ - #----------------------------- bbknn: neighbors_within_batch: - #---------------------------- - # find neighbour parameters - #----------------------------- + + # Find neighbour parameters neighbors: &atac_neighbors - # number of Principal Components to calculate for neighbours and umap: - # -if no correction is applied, PCA will be calculated and used to run UMAP and clustering on - # -if Harmony is the method of choice, it will use these components to create a corrected dim red.) - # note: scvelo default is 30 npcs: 30 - # number of neighbours k: 30 - # metric: euclidean | cosine metric: euclidean - # scanpy | hnsw (from scvelo) method: scanpy -#---------------------------------------------- + + +#----------------------- # multimodal integration +# ---------------------- # remember to specify knn graph params in the section "neighbors" -#---------------------------------------------- multimodal: - # True or false depending on whether you want to run batch correction - run: True - # what method(s) to use to run batch correction, you can specify multiple - # choices: totalvi, mofa, MultiVI, WNN - # list e.g. below + run: True tools: - WNN - totalvi - multiVI - - # this is the column you want to batch correct on. if you specify a comma separated list, - # they will be all used simultaneosly. if you want to test correction for one at a time, - # specify one at a time and run the pipeline in different folders i.e. integration_by_sample, - # integration_by_tissue ... column_categorical: sample_id - # extra params: + + # TotalVI arguments totalvi: - # this is a minimal set of parameters that will be expected - # you can add any other param from the tutorials and they will - # be parsed alongside the others - - # totalvi will run on rna and prot modalities: rna,prot exclude_mt_genes: True mt_column: mt - # to filter outliers manually create a column called prot_outliers in mdata['prot'].obs filter_by_hvg: True filter_prot_outliers: False model_args: @@ -190,148 +156,90 @@ multimodal: train_size: 0.9 early_stopping: True training_plan: None + + # MultiVI arguments MultiVI: - # this is a minimal set of parameters that will be expected - # you can add any other param from the tutorials and they will - # be parsed alongside the others - # leave arguments blank for default lowmem: True - # Set lowmem to True will subset the atac to the top 25k HVF. - # This is to deal with concatenation of atac,rna on large datasets which at the moment is suboptimally required by scvitools. - # >100GB of RAM are required to concatenate atac,rna with 15k cells and 120k total features (union rna,atac) model_args: - # (default: None) - n_hidden : - # (default: None) - n_latent : - #(bool,default: True) - region_factors : True - #{‘normal’, ‘ln’} (default: 'normal') + n_hidden : + n_latent : + region_factors : True latent_distribution : 'normal' - #(bool,default: False) - deeply_inject_covariates : False - #(bool, default: False) - fully_paired : False + deeply_inject_covariates : False + fully_paired : False + training_args: - #(default: 500) - max_epochs : 500 - #float (default: 0.0001) - lr : 0.0001 - #leave blanck for default str | int | bool | None (default: None) + max_epochs : 500 + lr : 0.0001 use_gpu : - # float (default: 0.9) - train_size : 0.9 - # leave blanck for default, float | None (default: None) - validation_size : - # int (default: 128) + train_size : 0.9 + validation_size : batch_size : 128 - #float (default: 0.001) - weight_decay : 0.001 - #float (default: 1e-08) - eps : 1e-08 - #bool (default: True) - early_stopping : True - #bool (default: True) + weight_decay : 0.001 + eps : 1e-08 + early_stopping : True save_best : True - #leave blanck for default int | None (default: None) check_val_every_n_epoch : - #leave blanck for default int | None (default: None) - n_steps_kl_warmup : - # int | None (default: 50) + n_steps_kl_warmup : n_epochs_kl_warmup : 50 - #bool (default: True) - adversarial_mixing : True - #leave blanck for default dict | None (default: None) + adversarial_mixing : True training_plan : + + # Mofa arguments mofa: - # this is a minimal set of parameters that will be expected - # you can add any other param from the tutorials and they will - # be parsed alongside the others - # (comma-separated string, no spaces) modalities: rna,prot,atac filter_by_hvg: True n_factors: 10 n_iterations: 1000 - #pick one among fast, medium, slow convergence_mode: fast save_parameters: False - #if save_parameters True, set the following, otherwise leave blank outfile: path/to/h5ad/to_save_model_to + + # WNN arguments WNN: - # muon implementation of WNN - modalities: rna,prot,atac - # run wnn on batch corrected unimodal data, set each of the modalities you want to use to calc WNN to ONE method. - # leave to None and it will default to de novo calculation of neighbours on non corrected data for that modality using specified params + modalities: rna,prot,atac batch_corrected: - # options are: "bbknn", "scVI", "harmony", "scanorama" rna: None - # options are "harmony", "bbknn" prot: None - # options are "harmony" - atac: None - # please use anchors (&) and scalars (*) in the relevant place - # i.e. &rna_neighbors will be called by *rna_neighbors where referenced + atac: None + + # please use anchors (&) and scalars (*) if necessary knn: rna: *rna_neighbors prot: *prot_neighbors atac: *atac_neighbors - #WNN has its own neighbors search, specify here - n_neighbors: #leave blank and it will default to aritmetic mean across modalities neighbors + + # WNN neighbour search + n_neighbors: n_bandwidth_neighbors: 20 n_multineighbors: 200 metric: 'euclidean' low_memory: True - - ### - # neighbours knn calculation for multimodal analysis. - ### + + # KNN calculation for multimodal analysis neighbors: - # number of Principal Components to calculate for neighbours and umap: - # -if no correction is applied, PCA will be calculated and used to run UMAP and clustering on - # -if Harmony is the method of choice, it will use these components to create a corrected dim red.) - # note: scvelo default is 30 npcs: 30 - # number of neighbours k: 30 - # metric: euclidean | cosine metric: euclidean - # scanpy | hnsw (from scvelo) method: scanpy - -#----------------------------- -# Plot -#----------------------------- +#-------------------- +# Plotting parameters +#-------------------- plotqc: - # grouping var must be a categorical varible, - # (comma-seprated strings, no spaces) - # umaps comparing the integration (one plot per value in the group) - # for each batch correction column plus any extras in grouping var grouping_var: sample_id - # what other metrics do you want to plot on each modalities embedding, (one plot per group) - # use mod:variable notation, - # any metrics that you want to plot on all modality umaps go under "all" - # these can be categorical or numeric + all: rep:receptor_subtype rna: rna:total_counts prot: prot:total_counts atac: multimodal: rna:total_counts - # if you want to add any additional plots, just remove the log file logs/plot_batch_corrected_umaps.log - # and run panpipes integration make plot_umaps -# ---------------- -# Make final object -# ---------------- -# Final choices: Leave blank until you have reviewed the results from running -# panpipes integration make full -# This step will produce a mudata object with one layer per modality with -# one correction per modality and one multimodal layer. -# Choose the integration results you want to merge in the final object -# For unimodal integration: to pick the uncorrected version use "no_correction" -# then run -# panpipes integration make merge_integration + +# ------------------------- +# Creating the final object +# ------------------------- final_obj: rna: include: True @@ -345,4 +253,3 @@ final_obj: multimodal: include: True bc_choice: totalvi -