Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Visualisation documentation #255

Merged
merged 13 commits into from
Apr 26, 2024
161 changes: 161 additions & 0 deletions docs/yaml_docs/pipeline_visualization_yml.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,161 @@
<style>
.parameter {
border-top: 4px solid lightblue;
background-color: rgba(173, 216, 230, 0.2);
padding: 4px;
display: inline-block;
font-weight: bold;
}
</style>

# Visualization YAML

In this documentation, the parameters of the `visualization` configuration yaml file are explained.
This file is generated running `panpipes vis config`. <br> The individual steps run by the pipeline are described in the [visualization workflow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/vis.html).
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved
When running the visualization workflow, panpipes provides a basic `pipeline.yml` file.
To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data.
However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html).

For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md)

You can download the different ingestion `pipeline.yml` files here:
- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes vis config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_vis/pipeline.yml)
- `pipeline.yml` file for [Visualizing data Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/visualization/pipeline_yml.html): [Download here](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/29daa86241829b362152785caf30ab61/pipeline.yml)

## Compute resources options
<span class="parameter">resources</span><br>
Computing resources to use, specifically the number of threads used for parallel jobs.
Specified by the following three parameters:
- <span class="parameter">threads_high</span> `Integer`, Default: 1<br>
Number of threads used for high intensity computing tasks.
For each thread, there must be enough memory to load all your input files at once and create the MuData object.

- <span class="parameter">threads_medium</span> `Integer`, Default: 1<br>
Number of threads used for medium intensity computing tasks.
For each thread, there must be enough memory to load your mudata and do computationally light tasks.

- <span class="parameter">threads_low</span> `Integer`, Default: 1<br>
Number of threads used for low intensity computing tasks.
For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two.

<span class="parameter">condaenv</span> `String` (Path)<br>
Path to conda environment that should be used to run panpipes.
Leave blank if running native or your cluster automatically inherits the login node environment

## Loading and merging data options
### Data format

<span class="parameter">sample_prefix</span> `String`, Mandatory parameter, Default: test<br>
Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow.

<span class="parameter">mudata_obj</span> `String`, Mandatory parameter <br>
Path to the output file from preprocessing (e.g. `../vis/test.h5mu`).
Ensure that the submission file is in the right format and that the correct path is provided.

<span class="parameter">modalities</span><br>
<span class="parameter">rna</span> `Boolean`, Default: True <br>
<span class="parameter">prot</span> `Boolean`, Default: True <br>
<span class="parameter">atac</span> `Boolean`, Default: False <br>
<span class="parameter">rep</span> `Boolean`, Default: True <br>
<span class="parameter">multimodal</span> `Boolean`, Default: True <br>
Set the modalities to True or False depending on what is present in the mudata_obj
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved

<span class="parameter">grouping_vars</span> `String`, Default: sample_id rna:leiden_res0.6 <br>
On dot plots and bar plots, grouping vars are used to group other features (for categorical, continuous, and feature plots).

giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved
## Plot Markers

The csv filed can be specified in the `vis` configuration file as follows:
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved

pipeline_vis config file: (pipeline.yml)

```yaml
# the full list will be plotted in dot plots and matrix plots, one plot per group
full:
- long_file1.csv
- long_file2.csv
# the shorter list will be plotted on umaps as well as other plot types, one plot per group
minimal:
- short_file1.csv

```
<span class="parameter">custom_markers</span><br>
- <span class="parameter">files</span><br>

- <span class="parameter">full:</span><br>
The full list will be plotted in dot plots and matrix plots, with one plot per group.
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved

- <span class="parameter">minimal:</span><br>
The shorter list will be plotted on umaps as well as other plot types, with one plot per group.


- <span class="parameter">paired_scatter:</span>`String`, Default: <br>
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved
Where different normalization exists in a modality, choose which one to use, set X or leave blank to use the mdata[mod].X assay.
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved

- <span class="parameter">layers:</span><br>
- <span class="parameter">rna:</span>`String`, Default: logged_counts<br>
- <span class="parameter">prot:</span>`String`, Default: clr<br>
- <span class="parameter">atac:</span>`String`, Default: signac_norm<br>
Check [gene_list_format.md](https://github.com/DendrouLab/panpipes/edit/clustering_g/docs/usage/gene_list_format.md) for Plot marker csv format instructions.
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved


## Plot metadata variables

- <span class="parameter">categorical_vars:</span>`String`, Default: &categorical_vars<br>
- <span class="parameter">all:</span>`String`, Default: rep:receptor_subtype sample_id<br>
Metrics to be plotted on every modality.
- <span class="parameter">rna:</span>`String`, Default: rna:predicted_doublets rna:phase<br>
- <span class="parameter">prot:</span>`String`, Default: prot:leiden_res0.2 prot:leiden_res1<br>
- <span class="parameter">atac:</span>`String`, Default: <br>
- <span class="parameter">rep:</span>`String`, Default: rep:has_ir<br>
- <span class="parameter">multimodal:</span>`String`, Default: leiden_totalVI mdata_colsr<br>

- <span class="parameter">continuous_vars:</span>`String`, Default: &continuous_vars<br>
- <span class="parameter">all:</span>`String`, Default:leiden_res0.5<br>
Metrics to be plotted on every modality.
- <span class="parameter">rna:</span>`String`, Default: rna:total_counts<br>
- <span class="parameter">prot:</span>`String`, Default: prot:total_counts<br>
- <span class="parameter">atac:</span>`String`, Default: <br>
- <span class="parameter">multimodal:</span>`String`, Default: rna:total_counts prot:total_counts<br>

- <span class="parameter"paired_scatter:</span>`String`, Default: scatter_features.csv<br>
Check [gene_list_format.md](https://github.com/DendrouLab/panpipes/edit/clustering_g/docs/usage/gene_list_format.md) for metadata csv format instructions.

## Plot style
Choose the plot type desired.
- <span class="parameter">do_plots:</span><br>
- <span class="parameter">categorical_barplots:</span>`Boolean`, Default: True<br>
Plot each categorical variable as a bar plot.
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved
- <span class="parameter">categorical_stacked_barplots:</span>`Boolean`, Default: True<br>
Plot each grouping var as a bar plot, with categorical variables stacked.
- <span class="parameter">continuous_violin:</span>`Boolean`, Default: True<br>
Plot each continuous variable as a violin plot.
- <span class="parameter">marker_dotplots:</span>`Boolean`, Default: True<br>
Plots a marker dotplots as produced by scanpy.pl.dotplot.
- <span class="parameter"> marker_matrixplots:</span>`Boolean`, Default: True<br>
Plots marker matrixplot as produced by scanpy.pl.matrixplot.
- <span class="parameter">paired_scatters:</span>`Boolean`, Default: True<br>
Plots scatter plots as defined in paired_scatters csv file (scatter_features.csv).

- <span class="parameter">embedding:</span><br>
Define all embedding for the plots and modalities.
giuliaelgarcia marked this conversation as resolved.
Show resolved Hide resolved
- <span class="parameter">rna:</span><br>
- <span class="parameter">run:</span>`Boolean`, Default:True<br>
- <span class="parameter">basis:</span>`String`, Default: X_umap_mindist_0.25<br>
- <span class="parameter">prot:</span><br>
- <span class="parameter">run:</span>`Boolean`, Default:True<br>
- <span class="parameter">basis:</span>`String`, Default:X_umap X_pca<br>

- <span class="parameter">atac:</span><br>
- <span class="parameter">run:</span>`Boolean`, Default:False<br>
- <span class="parameter">basis:</span>`String`, Default:X_umap<br>










Loading