Skip to content

Commit

Permalink
Merge pull request #202 from DendrouLab/clustering_g
Browse files Browse the repository at this point in the history
clustering yaml creayed
  • Loading branch information
bio-la authored Apr 26, 2024
2 parents d53c976 + c78eea5 commit 844e4a0
Show file tree
Hide file tree
Showing 5 changed files with 52 additions and 29 deletions.
3 changes: 2 additions & 1 deletion docs/yaml_docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,4 +12,5 @@ Workflows configuration files
spatial_qc
spatial_preprocess
spatial_deconvolution
pipeline_refmap_yml.md
pipeline_refmap_yml

66 changes: 41 additions & 25 deletions docs/yaml_docs/pipeline_clustering_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,10 @@ In this documentation, the parameters of the `clustering` configuration yaml fil
This file is generated running `panpipes clustering config`. <br>
The individual steps run by the pipeline are described in [clustering workflow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/clustering.html)

When running the clustering workflow, panpipes provides a basic `pipeline.yml` file.
The `clustering` workflow works with outputs generated by the `integration` workflow, and expects a `MuData` object with
`neighbors` saved in the `.uns` of the global layer to run clustering on the multimodal embedding. If `neighbors` are calculated on each modality layers, these can be reused or re-calculated on the flight.

When running the clustering workflow, panpipes provides a basic `pipeline.yml` file to customize with parameters.
To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.

However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html).
Expand Down Expand Up @@ -62,24 +65,30 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th
Specify the full object if your scaled_obj contains only HVG. If your scaled_obj contains all the genes then leave full_obj blank.
panpipes will use the full object to do marker genes analysis (rank_gene_groups) and for plotting those genes.
- <span class="parameter">modalities</span><br>
- <span class="parameter">rna</span> `Boolean`, Default: True<br>
Which modalities to run clustering on.
- <span class="parameter">rna</span> `Boolean`, Default: True<br> If set to `True`, the workflow will stop if it doesn't find a modality named 'rna'
- <span class="parameter">prot</span> `Boolean`, Default: True<br>
If set to `True`, the workflow will stop if it doesn't find a modality named 'prot'
- <span class="parameter">atac</span> `Boolean`, Default: False<br>
If set to `True`, the workflow will stop if it doesn't find a modality named 'atac'

- <span class="parameter">spatial</span> `Boolean`, Default: False<br>
Run clustering on each individual modality.
If set to `True`, the workflow will stop if it doesn't find a modality named 'spatial'


- <span class="parameter">multimodal</span><br>
- <span class="parameter">rna_clustering</span> `Boolean`, Default: True<br>
- <span class="parameter">integration_method</span> `String`, Default: WNN<br>
Options here include WNN, mofa, and totalVI, and it tells us where to look for.
- <span class="parameter">rna_clustering</span> `Boolean`, Default: False<br> If set to True, runs clustering on multimodal embedding
- <span class="parameter">integration_method</span> `String`, Default: None<br>
In case you have run WNN and want to run clustering on the wnn embedding, specify "WNN" here. The neigbhours are saved with a different `--neighbors_key` param only for wnn, for every other method (totalvi, multivi, mofa) leave this parameter blank.


## Parameters for finding neighbours

- <span class="parameter">neighbors:</span>
Sets the number of neighbors to use when calculating the graph for clustering and umap.
- <span class="parameter">rna:</span>

- <span class="parameter">use_existing </span> `Boolean`, Default: True<br>
- <span class="parameter">use_existing </span> `Boolean`, Default: True<br> Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters
- <span class="parameter">dim_red </span> `String`, Default: X_pca<br>
Defines which representation in .obsm to use for nearest neighbors
- <span class="parameter">n_dim_red</span> `Integer`, Default: 30<br>
Expand All @@ -94,7 +103,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

- <span class="parameter">prot:</span>

- <span class="parameter">use_existing </span> `Boolean`, Default: True<br>
- <span class="parameter">use_existing </span> `Boolean`, Default: True<br> Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters
- <span class="parameter">dim_red </span> `String`, Default: X_pca<br>
Defines which representation in .obsm to use for nearest neighbors
- <span class="parameter">n_dim_red</span> `Integer`, Default: 30<br>
Expand All @@ -109,7 +118,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

- <span class="parameter">atac:</span>

- <span class="parameter">use_existing </span> `Boolean`, Default: True<br>
- <span class="parameter">use_existing </span> `Boolean`, Default: True<br> Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters
- <span class="parameter">dim_red </span> `String`, Default: X_lsi<br>
Defines which representation in .obsm to use for nearest neighbors
- <span class="parameter">n_dim_red</span> `Integer`, Default: 1<br>
Expand All @@ -125,7 +134,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

- <span class="parameter">spatial:</span>

- <span class="parameter">use_existing </span> `Boolean`, Default: False<br>
- <span class="parameter">use_existing </span> `Boolean`, Default: False<br> Use existing neighbours in .uns calculated in the `integration` workflow. If `False`, it will recalculate using the following parameters
- <span class="parameter">dim_red </span> `String`, Default: X_pca<br>
Defines which representation in .obsm to use for nearest neighbors
- <span class="parameter">n_dim_red</span> `Integer`, Default: 30<br>
Expand All @@ -142,51 +151,51 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

- <span class="parameter">umap:</span>

- <span class="parameter">run </span> `Boolean`, Default: True<br>
- <span class="parameter">run </span> `Boolean`, Default: True<br> Set to `True` runs the umap calculation and plotting.
- <span class="parameter">rna:</span>
- <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
Can specify an array: 0.25,0.5
Can specify a single float or an array: 0.25,0.5
- <span class="parameter">prot:</span>
- <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
Can specify an array: 0.25,0.5,0.8
Can specify a single float or an array: 0.25,0.5,0.8
- <span class="parameter">atac:</span>
- <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
Can specify an array: 0.25,0.5,0.8
Can specify a single float or an array: 0.25,0.5,0.8
- <span class="parameter">multimodal:</span>
- <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
Can specify an array: 0.25,0.5,0.8
Can specify a single float or an array: 0.25,0.5,0.8
- <span class="parameter">rna:</span>
- <span class="parameter">mindist </span> `Float`, Default: 0.5<br>
Can specify an array: 0.25,0.5,0.8
Can specify a single float or an array: 0.25,0.5,0.8

## Parameters for clustering

- <span class="parameter">clusterspecs:</span>
- <span class="parameter">rna:</span>
- <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
Can specify an array: 0.2,0.6,1
Can specify a single float or an array: 0.2,0.6,1
- <span class="parameter">algorithm</span> `String`, Default: leiden<br>
Options include louvain or leiden.
- <span class="parameter">prot:</span>
- <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
Can specify an array: 0.2,0.6,1
Can specify a single float or an array: 0.2,0.6,1
- <span class="parameter">algorithm</span> `String`, Default: leiden<br>
Options include louvain or leiden.

- <span class="parameter">atac:</span>
- <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
Can specify an array to compute in parallel: 0.2,0.6,1
Can specify a single float or an array to compute in parallel: 0.2,0.6,1
- <span class="parameter">algorithm</span> `String`, Default: leiden<br>
Options include louvain or leiden.
- <span class="parameter">multimmodal:</span>
- <span class="parameter">resolutions </span> `Float`, Default: 0.5, 0.7<br>
Can specify an array to compute in parallel: 0.2,0.6,1
Can specify a single float or an array to compute in parallel: 0.2,0.6,1
- <span class="parameter">algorithm</span> `String`, Default: leiden<br>
Options include louvain or leiden.

- <span class="parameter">spatial:</span>
- <span class="parameter">resolutions </span> `Float`, Default: 0.2, 0.6, 1<br>
Can specify an array to compute in parallel: 0.2,0.6,1
Can specify a single float or an array to compute in parallel: 0.2,0.6,1
- <span class="parameter">algorithm</span> `String`, Default: leiden<br>
Options include louvain or leiden.

Expand All @@ -207,8 +216,10 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis
- <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
- <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.
This parameter is mandatory if pseudo_seurat is set to True
- <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.
This parameter is mandatory if pseudo_seurat is set to True
- <span class="parameter">prot:</span><br>
- <span class="parameter">run </span> `Boolean`, Default: True<br>
Expand All @@ -219,8 +230,10 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
- <span class="parameter">method </span> `String`, Default: wilcoxon<br>
- <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
- <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.
This parameter is mandatory if pseudo_seurat is set to True
- <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.
This parameter is mandatory if pseudo_seurat is set to True

- <span class="parameter">atac:</span><br>
Expand All @@ -234,8 +247,10 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
- <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
- <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations.
This parameter is mandatory if pseudo_seurat is set to True
- <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.
This parameter is mandatory if pseudo_seurat is set to True


Expand All @@ -246,9 +261,9 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
Options include: ‘logreg’, ‘t-test’, ‘wilcoxon’, ‘t-test_overestim_var’
- <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
- <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
This parameter is mandatory if pseudo_seurat is set to True
Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True
- <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
This parameter is mandatory if pseudo_seurat is set to True
Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.This parameter is mandatory if pseudo_seurat is set to True


- <span class="parameter">spatial:</span><br>
Expand All @@ -261,11 +276,12 @@ When pseudo_seurat is set to True then a [python implementation](https://github.
Marker analysis is run for clusters >= mincells. If a cluster ncells < mincells , then the cluster is excluded from marker analysis
- <span class="parameter">pseudo_seurat </span> `Boolean`, Default: False<br>
- <span class="parameter">minpct </span> `Float`, Default: 0.1<br>
This parameter is mandatory if pseudo_seurat is set to True
Only test genes that are detected in a minimum fraction of min.pct cells in either of the two populations. This parameter is mandatory if pseudo_seurat is set to True
- <span class="parameter">threshuse </span> `Float`, Default: 0.25<br>
Limit testing to genes which show, on average, at least X-fold difference (log-scale) between the two groups of cells.
This parameter is mandatory if pseudo_seurat is set to True
## Plot specifications
Used to define which metadata columns are used in the visualizations
Define which layers are used in the markers visualization
- <span class="parameter">plotspecs:</span><br>
- <span class="parameter">layers: </span><br>
- <span class="parameter">rna </span> `String`, Default: logged_counts<br>
Expand Down
3 changes: 2 additions & 1 deletion panpipes/panpipes/pipeline_clustering.py
Original file line number Diff line number Diff line change
Expand Up @@ -43,9 +43,10 @@ def set_up_dirs(log_file):
## Single modality scripts
## ------------------------------------

# -----------------------------------=
# --------------------------------------
# neighbors
# --------------------------------------
# TO DO create task to re-run neighbours on multimodal outer representations (this script can only read in each mod layer)
@follows(set_up_dirs)
@originate(PARAMS['mudata_with_knn'])
def run_neighbors(outfile):
Expand Down
7 changes: 6 additions & 1 deletion panpipes/panpipes/pipeline_clustering/pipeline.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ modalities:
atac: False
spatial: False

# if True, will look for WNN, or totalVI output
# if True, will look for WNN, mofa, multivi, totalVI embeddings
multimodal:
run_clustering: True
integration_method:
Expand All @@ -40,22 +40,26 @@ multimodal:
# ---------------------------------------
#
# -----------------------------

neighbors:
rna:
#use the knn calculated in the integration workflow. If False it will recalculate
use_existing: True
dim_red: X_pca
n_dim_red: 30
k: 30
metric: euclidean
method: scanpy
prot:
#use the knn calculated in the integration workflow. If False it will recalculate
use_existing: True
dim_red: X_pca
n_dim_red: 30
k: 30
metric: euclidean
method: scanpy
atac:
#use the knn calculated in the integration workflow. If False it will recalculate
use_existing: True
dim_red: X_lsi
dim_remove: 1
Expand All @@ -64,6 +68,7 @@ neighbors:
metric: euclidean
method: scanpy
spatial:
#use the knn calculated in the integration workflow. If False it will recalculate
use_existing: False
dim_red: X_pca
n_dim_red: 30
Expand Down
2 changes: 1 addition & 1 deletion panpipes/python_scripts/run_umap.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@
default=0.1,
help="no. neighbours parameters for sc.pp.neighbors()")
parser.add_argument("--neighbors_key",
default="neighbors", help="algortihm choice from louvain and leiden")
default="neighbors", help="name of the saved knn neighbors")

args, opt = parser.parse_known_args()
L.info(args)
Expand Down

0 comments on commit 844e4a0

Please sign in to comment.