Skip to content

Commit

Permalink
integration pipeline.yml modified until ATAC modality
Browse files Browse the repository at this point in the history
  • Loading branch information
Lilly-May committed Mar 6, 2024
1 parent 1637535 commit 3788e93
Show file tree
Hide file tree
Showing 3 changed files with 86 additions and 127 deletions.
81 changes: 42 additions & 39 deletions docs/yaml_docs/pipeline_integration_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,41 +16,45 @@ When running the integration workflow, panpipes provides you with a basic `pipel

You can download the different integration pipeline.yml files here:
- Basic `pipeline.yml` file (not pre-filled) that is generated when calling `panpipes integration config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_integration/pipeline.yml)
- `pipeline.yml`for [Integration tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/uni_multi_integration/pipeline_yml.html)
- `pipeline.yml`for Integration tutorial: [View and Download here](https://panpipes-tutorials.readthedocs.io/en/latest/uni_multi_integration/pipeline_yml.html)

For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md)

## Compute resources options

- <span class="parameter">resources</span>

<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Specified by the following parameters:

- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
For each thread, there must be enough memory to load your MuData object which was created in the preprocessing step of
the workflow.
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
For each thread, there must be enough memory to load your MuData object which was created in the preprocessing step of
the workflow.

- <span class="parameter">threads_medium</span> `Integer`, Default: 1<br>
Number of threads used for medium intensity computing tasks.
For each thread, there must be enough memory to load your mudata and do computationally light tasks.
- <span class="parameter">threads_medium</span> `Integer`, Default: 1<br>
Number of threads used for medium intensity computing tasks.
For each thread, there must be enough memory to load your mudata and do computationally light tasks.

- <span class="parameter">threads_low</span> `Integer`, Default: 1<br>
Number of threads used for low intensity computing tasks.
For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
- <span class="parameter">threads_gpu</span> `Integer`, Default: 2<br>
Number of threads used for low intensity computing tasks.
For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.
- <span class="parameter">threads_gpu</span> `Integer`, Default: 2<br>
Number of cores per gpu used for computing tasks.
For each thread, there must be enough memory to compute the tasks above.

<span class="parameter">condaenv</span> `String`<br>
Path to conda environment that should be used to run panpipes.
Leave blank if running native or your cluster automatically inherits the login node environment

<span class="parameter">queues</span><br>
Allows for tweaking which queues jobs get submitted to, in case there is a special queue for long jobs, or you have access to a gpu-specific queue.
The default queue should be specified in your .cgat.yml file.
Leave blank if you do not want to use any alternative queues.
- <span class="parameter">long</span><br>
- <span class="parameter">gpu</span><br>

## Loading and merging data options
### Data format


<span class="parameter">sample_prefix</span> `String`, Mandatory parameter, Default: test<br>
Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow.

Expand All @@ -60,9 +64,9 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

## Batch correction

**Unimodal: correct each modality independently**
**Batch correction is done unimodal, meaning each modality is batch corrected independently.**

## RNA modality
### RNA modality

<span class="parameter">rna:</span>
Batch correction for the RNA modality is specified by the following parameters:
Expand All @@ -80,7 +84,7 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

The column name of the covariate you want want to batch correct on, if a comma-separated list is specified then all will be used simultaneously.

### Harmony arguments
#### Harmony arguments

- <span class="parameter">harmony:</span>
Basic parameters required to run harmony:
Expand All @@ -91,14 +95,14 @@ Prefix for the sample that comes out of the filtering/ preprocessing steps of th

For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html)

### BBKNN arguments
#### BBKNN arguments

- <span class="parameter">bbknn:</span>
- <span class="parameter">neighbors_within_batch:</span> `Integer`, Default: 3<br>

For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/)

### SCVI arguments
#### SCVI arguments
- <span class="parameter">scvi</span>: SCVI parameters are specified as
- <span class="parameter">exclude_mt_genes:</span> `Boolean`, Default: True<br>
- <span class="parameter">exclude_mt_genes:</span> `String`, Default: mt<br>
Expand Down Expand Up @@ -134,7 +138,7 @@ For more information on `bbknn` check the [bbknn documentation](https://bbknn.re
For more information on `scvi` check the [scvi documentation](https://docs.scvi-tools.org/en/stable/api/reference/scvi.model.SCVI.html)

### Find neighbour parameters
#### Find neighbour parameters
Parameters to compute the connectivity graph on RNA

- <span class="parameter">neighbors:</span> `String`<br>
Expand All @@ -152,7 +156,7 @@ Parameters to compute the connectivity graph on RNA
The method can either be scanpy or hnsw


## Protein modality
### Protein modality
<span class="parameter">prot:</span>
Batch correction for the protein modality is specified by the following parameters:

Expand All @@ -168,34 +172,33 @@ Parameters to compute the connectivity graph on RNA

The column you want to batch correct on, if a comma-separated list is specified then all will be used simultaneously

### Harmony arguments
#### Harmony arguments

- <span class="parameter">harmony:</span>
Basic parameters required to run harmony:

- <span class="parameter">sigma</span> `Float`, Default: 0.1<br>
- <span class="parameter">theta</span> `Float`, Default: 1.0<br>
- <span class="parameter">npcs</span> `Integer`, Default: 30<br>
<span class="parameter">harmony</span><br>
Basic parameters required to run harmony:

For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html)
- <span class="parameter">sigma</span> `Float`, Default: 0.1<br>
- <span class="parameter">theta</span> `Float`, Default: 1.0<br>
- <span class="parameter">npcs</span> `Integer`, Default: 30<br>


### BBKNN arguments
For more information on `harmony` check the [harmony documentation](https://portals.broadinstitute.org/harmony/reference/RunHarmony.html)


#### BBKNN arguments

- <span class="parameter">bbknn:</span>
<span class="parameter">bbknn</span><br>
- <span class="parameter">neighbors_within_batch:</span> `Integer`, Default: 3<br>

For more information on `bbknn` check the [bbknn documentation](https://bbknn.readthedocs.io/en/latest/)

### Find neighbour parameters
#### Find neighbour parameters

Parameters to compute the connectivity graph on Protein

- <span class="parameter">neighbors:</span> `String`, Default: &prot_neighbors<br>
<span class="parameter">neighbors</span> `String`, Default: &prot_neighbors<br>

- <span class="parameter">npcs</span> `Integer`, Default: 30<br>
Number of principal components to calculate for neighbors and Umap
- <span class="parameter">npcs</span> `Integer`, Default: 30<br>
Number of principal components to calculate for neighbors and Umap

- <span class="parameter">k</span> `Integer`, Default: 30<br>
Number of neighbors
Expand All @@ -207,7 +210,7 @@ Parameters to compute the connectivity graph on Protein
The method can either be scanpy or hnsw


## ATAC modality
### ATAC modality

<span class="parameter">atac:</span>
Batch correction for the ATAC modality is specified by the following parameters:
Expand All @@ -226,7 +229,7 @@ Parameters to compute the connectivity graph on Protein

The column you want to batch correct on, if a comma-separated list is specified then all will be used simultaneously

### Harmony arguments
#### Harmony arguments

- <span class="parameter">harmony:</span>
Basic parameters required to run harmony:
Expand Down
3 changes: 2 additions & 1 deletion docs/yaml_docs/pipeline_preprocess_yml.md
Original file line number Diff line number Diff line change
Expand Up @@ -424,9 +424,10 @@ Whether applying scaling or not is still a matter of debate, as stated in the [L
- <span class="parameter">color_by</span> `String`, Default: sample_id<br>
Specify the covariate you want to use to color the dimensionality reduction plot.

- <span class="parameter">dim_remove</span> `TODO`<br>
- <span class="parameter">dim_remove</span> `Integer`<br>
Whether to remove the component(s) associated to technical artifacts.
For instance, it is common to remove the first LSI component, as it is often associated with batch effects.
Specify `1` to remove the first component.
Leave blank to avoid removing any.


Expand Down
129 changes: 42 additions & 87 deletions panpipes/panpipes/pipeline_integration/pipeline.yml
Original file line number Diff line number Diff line change
@@ -1,76 +1,59 @@
# ============================================================
# Integration workflow Panpipes (pipeline_integration.py)
# ============================================================
# written by Charlotte Rich-Griffin, Fabiola Curion
# This file contains the parameters for the integration workflow.
# For full descriptions of the parameters, see the documentation at https://panpipes-pipelines.readthedocs.io/en/latest/yaml_docs/pipeline_integration_yml.html

# compute resource options
# ------------------------

#--------------------------
# Compute resources options
#--------------------------
resources:
# Number of threads used for parallel jobs
# this must be enough memory to load your mudata and do computationally intensive tasks
threads_high: 1
# this must be enough memory to load your mudata and do computationally light tasks
threads_medium: 1
# this must be enough memory to load text files and do plotting, requires much less memory than the other two
threads_low: 1
# if you access to a gpu-specific queue, how many gpu threads to request, make sure to edit the queues section below,
# so that panpipes can find your gpu queue

threads_gpu: 2
# path to conda env, leave blank if running native or your cluster automatically inherits the login node environment

condaenv:

# allows for tweaking which queues jobs get submitted to,
# in case there is a special queue for long jobs or you have access to a gpu-specific queue
# the default queue should be specified in your .cgat.yml file
# leave blank if you do not want to use the alternative queues
queues:
long:
gpu:
long:
gpu:

# Start
# --------------------------
# either one that exists already with
# --------------------------------
# Loading and merging data options
# --------------------------------

# ----------------------------
# Data format
sample_prefix: test
#this is what comes out of the filtering/preprocessing
preprocessed_obj: ../preprocess/test.h5mu
# contains layers: raw_counts, logged_counts, and has scaled or logged counts in X


#--------------------------
#-----------------
# Batch correction
# -------------------------
# unimodal: correct each modality independently
# ----------------
# Batch correction is done unimodal, meaning each modality is batch corrected independently

# ------------
# RNA modality
rna:
# True or false depending on whether you want to run batch correction
run: True
# what method(s) to use to run batch correction, you can specify multiple
# choices: harmony,bbknn,scanorama,scvi (comma-seprated string, no spaces)
run: True
tools: harmony,bbknn,scanorama,scvi
# this is the column you want to batch correct on. if you specify a comma separated list,
# they will be all used simultaneosly.
# Specifically all columns specified will be merged into one 'batch' columns.
# if you want to test correction for one at a time,
# specify one at a time and run the pipeline in different folders i.e. integration_by_sample,
# integration_by_tissue ...
column: sample_id
#-----------------------------
# Harmony args
#-----------------------------

# Harmony arguments
harmony:
# sigma value, used by Harmony
sigma: 0.1
# theta value used by Harmony, default is 1
sigma: 0.1
theta: 1.0
# number of pcs, used by Harmony
npcs: 30
#----------------------------

# BBKNN args # https://bbknn.readthedocs.io/en/latest/
#-----------------------------
bbknn:
neighbors_within_batch:
#-----------------------------

# SCVI args
#-----------------------------
scvi:
exclude_mt_genes: True
mt_column: mt
Expand All @@ -89,68 +72,40 @@ rna:
lr_scheduler_metric:
lr_patience: 8
lr_factor: 0.1
#----------------------------
# find neighbour parameters
#-----------------------------
# to reuse these params, (for example for WNN) please use anchors (&) and scalars (*) in the relevant place
# i.e. &rna_neighbors will be called by *rna_neighbors where referenced
neighbors: &rna_neighbors
# number of Principal Components to calculate for neighbours and umap:
# -if no correction is applied, PCA will be calculated and used to run UMAP and clustering on
# -if Harmony is the method of choice, it will use these components to create a corrected dim red.)
# the maximum number of dims for neighbors calculation can only only be lower or equal to the total number of dims for PCA or Harmony
# note: scvelo default is 30

# Find neighbour parameters
neighbors: &rna_neighbors
npcs: 30
# number of neighbours
k: 30
# metric: euclidean | cosine
metric: euclidean
# scanpy | hnsw (from scvelo)
method: scanpy

#--------------------------
# ----------------
# Protein modality
prot:
# True or false depending on whether you want to run batch correction
run: True
# what method(s) to use to run batch correction, you can specify multiple
# choices: harmony,bbknn,combat
run: True
tools: harmony
# this is the column you want to batch correct on. if you specify a comma separated list (no spaces),
# they will be all used simultaneosly. if you want to test correction for one at a time,
# specify one at a time and run the pipeline in different folders i.e. integration_by_sample,
# integration_by_tissue ...
column: sample_id
#----------------------------

# Harmony args
#-----------------------------
harmony:
# sigma value, used by Harmony
sigma: 0.1
# theta value used by Harmony, default is 1
sigma: 0.1
theta: 1.0
# number of pcs, used by Harmony
npcs: 30
#----------------------------

# BBKNN args # https://bbknn.readthedocs.io/en/latest/
#-----------------------------
bbknn:
neighbors_within_batch:
#----------------------------›
# find neighbour parameters
#-----------------------------

# Find neighbour parameters
neighbors: &prot_neighbors
# number of Principal Components to calculate for neighbours and umap:
# -if no correction is applied, PCA will be calculated and used to run UMAP and clustering on
# -if Harmony is the method of choice, it will use these components to create a corrected dim red.)
# note: scvelo default is 30
npcs: 30
# number of neighbours
k: 30
# metric: euclidean | cosine
metric: euclidean
# scanpy | hnsw (from scvelo)
method: scanpy
#--------------------------

# -------------
# ATAC modality
atac:
# True or false depending on whether you want to run batch correction
run: False
Expand Down

0 comments on commit 3788e93

Please sign in to comment.