diff --git a/docs/yaml_docs/pipeline_visualization_yml.md b/docs/yaml_docs/pipeline_visualization_yml.md new file mode 100644 index 00000000..f5c26330 --- /dev/null +++ b/docs/yaml_docs/pipeline_visualization_yml.md @@ -0,0 +1,177 @@ + + +# Visualization YAML + +In this documentation, the parameters of the `visualization` configuration yaml file are explained. +This file is generated by running `panpipes vis config`.
The individual steps run by the pipeline are described in the [visualization workflow](https://panpipes-pipelines.readthedocs.io/en/latest/workflows/vis.html). +When running the visualization workflow, panpipes provides a basic `pipeline.yml` file. +To run the workflow on your own data, you need to specify the parameters described below in the `pipeline.yml` file to meet the requirements of your data. +However, we do provide pre-filled versions of the `pipeline.yml` file for individual [tutorials](https://panpipes-pipelines.readthedocs.io/en/latest/tutorials/index.html). + +For more information on functionalities implemented in `panpipes` to read the configuration files, such as reading blocks of parameters and reusing blocks with `&anchors` and `*scalars`, please check [our documentation](./useful_info_on_yml.md) + +You can download the different ingestion `pipeline.yml` files here: +- Basic `pipeline.yml` file (not prefilled) that is generated when calling `panpipes vis config`: [Download here](https://github.com/DendrouLab/panpipes/blob/main/panpipes/panpipes/pipeline_vis/pipeline.yml) +- `pipeline.yml` file for [Visualizing data Tutorial](https://panpipes-tutorials.readthedocs.io/en/latest/visualization/pipeline_yml.html): [Download here](https://panpipes-tutorials.readthedocs.io/en/latest/_downloads/29daa86241829b362152785caf30ab61/pipeline.yml) + +## Compute resources options +resources
+Computing resources to use, specifically the number of threads used for parallel jobs. +Specified by the following three parameters: + - threads_high `Integer`, Default: 1
+ Number of threads used for high intensity computing tasks. + For each thread, there must be enough memory to load all your input files at once and create the MuData object. + + - threads_medium `Integer`, Default: 1
+ Number of threads used for medium intensity computing tasks. + For each thread, there must be enough memory to load your mudata and do computationally light tasks. + + - threads_low `Integer`, Default: 1
+ Number of threads used for low intensity computing tasks. + For each thread, there must be enough memory to load text files and do plotting, requires much less memory than the other two. + +condaenv `String` (Path)
+ Path to conda environment that should be used to run panpipes. + Leave blank if running native or your cluster automatically inherits the login node environment + +## Loading and merging data options +### Data format + +sample_prefix `String`, Mandatory parameter, Default: test
+Prefix for the sample that comes out of the filtering/ preprocessing steps of the workflow. + +mudata_obj `String`, Mandatory parameter
+ Path to the output file from preprocessing (e.g. `../vis/test.h5mu`). + Ensure that the submission file is in the right format and that the correct path is provided. + +modalities
+rna `Boolean`, Default: True
+prot `Boolean`, Default: True
+atac `Boolean`, Default: False
+rep `Boolean`, Default: True
+multimodal `Boolean`, Default: True
+Set the modalities to True or False depending on what is present in the input mudata_obj + +grouping_vars `String`, Default: sample_id rna:leiden_res0.6
+On dot plots and bar plots, grouping vars are used to group other features (for categorical, continuous, and feature plots). +Should be provided as a list as follows: + +```yaml +grouping_vars: + - sample_id + - rna:leiden_res0.6 + +``` + +## Plot Markers + +Check [gene_list_format.md](https://github.com/DendrouLab/panpipes/edit/clustering_g/docs/usage/gene_list_format.md) for Plot marker csv format instructions. + +The csv files containing the long/short gene lists for visulisations can be specified in the `vis` configuration file as follows: + +pipeline_vis config file: (pipeline.yml) + +```yaml +# the long list will be plotted in dot plots and matrix plots, one plot per group +full: + - long_file1.csv + - long_file2.csv +# the shorter list will be plotted on umaps as well as other plot types, one plot per group +minimal: + - short_file1.csv + +``` +custom_markers
+ - files
+ + - full:
+The long list will be plotted in dot plots and matrix plots, with one plot per group. + + - minimal:
+The shorter list will be plotted on umaps as well as other plot types, with one plot per group. + + +- paired_scatter:`String`, Default:
+ Produces a scatter plot. When different normalisations exists for a modality in the input MuData object, specifiy which layer to use or set X or leave blank to use the `mdata[mod].X`. assay. + +- layers:
+ - rna:`String`, Default: logged_counts
+ - prot:`String`, Default: clr
+ - atac:`String`, Default: signac_norm
+ +## Plot metadata variables + +- categorical_vars:`String`, Default: &categorical_vars
+ - all:`String`, Default: rep:receptor_subtype sample_id
+Metrics to be plotted on every modality. + - rna:`String`, Default: rna:predicted_doublets rna:phase
+ - prot:`String`, Default: prot:leiden_res0.2 prot:leiden_res1
+ - atac:`String`, Default:
+ - rep:`String`, Default: rep:has_ir
+ - multimodal:`String`, Default: leiden_totalVI mdata_colsr
+ +- continuous_vars:`String`, Default: &continuous_vars
+ - all:`String`, Default:leiden_res0.5
+Metrics to be plotted on every modality. + - rna:`String`, Default: rna:total_counts
+ - prot:`String`, Default: prot:total_counts
+ - atac:`String`, Default:
+ - multimodal:`String`, Default: rna:total_counts prot:total_counts
+ +- `String`, Default: scatter_features.csv
+ +## Plot style +Choose the plot type desired. +- do_plots:
+ + Plot each categorical variable as a bar plot. + For example, categorical variable "cluster" on x axis and n cells on y + - categorical_barplots:`Boolean`, Default: True
+ + Plot each grouping var as a bar plot, with categorical variables stacked. + For example, grouping var "sample_id" on x axis and n cells on y and colored by categorical variable "cluster" in a stack + - categorical_stacked_barplots:`Boolean`, Default: True
+ + Plot each continuous variable as a violin plot. + For example, grouping var "sample_id" on x axis and the continuous variable "doublet_scores" on y + - continuous_violin:`Boolean`, Default: True
+ + Plot marker dotplots as produced by scanpy.pl.dotplot + - marker_dotplots:`Boolean`, Default: True
+ + Plots marker matrixplot as produced by scanpy.pl.matrixplot. + - marker_matrixplots:`Boolean`, Default: True
+ + Plots scatter plots as defined in paired_scatters csv file (scatter_features.csv). + - paired_scatters:`Boolean`, Default: True
+ +- embedding:
+Define the embedding plots (e.g. UMAP, PCA) using the modality and embedding basis specified. This will plot all of minimal markers csv, and categorical, and continuous variables + - rna:
+ - run:`Boolean`, Default:True
+ - basis:`String`, Default: X_umap_mindist_0.25
+ - prot:
+ - run:`Boolean`, Default:True
+ - basis:`String`, Default:X_umap X_pca
+ + - atac:
+ - run:`Boolean`, Default:False
+ - basis:`String`, Default:X_umap
+ + + + + + + + + +