Merge pull request #4846 from nekrut/history_tutorial_update

History tutorial update
galaxyproject · Jul 2, 2024 · b81bd55 · b81bd55
2 parents 3e2fb66 + 9a7bb99
commit b81bd55
Show file tree

Hide file tree

Showing 41 changed files with 473 additions and 164 deletions.
diff --git a/_config.yml b/_config.yml
@@ -137,7 +137,11 @@ icon-tag:
   galaxy-eye: far fa-eye
   galaxy-gear: fas fa-cog
   galaxy-history: fas fa-columns
+  galaxy-dataset-collapse: fa fa-compress
   galaxy-history-archive: fas fa-archive
+  galaxy-history-size: fas fa-database
+  galaxy-history-storage-choice: fas fa-hard-drive
+  galaxy-history-refresh: fas fa-arrows-rotate
   galaxy-history-input: fas fa-sign-in-alt
   galaxy-history-answer: fas fa-sign-out-alt
   galaxy-home: fas fa-home
@@ -173,6 +177,7 @@ icon-tag:
   help: far fa-question-circle
   history-annotate: fas fa-comment
   history-share: fas fa-share-alt
+  history-select-multiple: fas fa-check-square
   instances: fas fa-globe
   interactive_tour: fas fa-magic
   keypoints: fas fa-key
@@ -237,6 +242,14 @@ icon-tag:
   zenodo_link: far fa-copy
   version: fas fa-code-commit
   rating: far fa-star
+  dataset-rerun: fas fa-arrow-rotate-right
+  dataset-related-datasets: fas fa-sitemap
+  dataset-visualize: fas fa-chart-bar
+  dataset-save: fas fa-save
+  dataset-link: fas fa-link
+  dataset-question: fas fa-question
+  dataset-info: fas fa-info-circle
+  dataset-undelete: fas fa-trash-can-arrow-up
 
 # To exclude in _site
 exclude:
@@ -279,7 +292,7 @@ plugins:
   - jekyll-feed
   - jekyll-redirect-from
 
-# An announcement to display on the home page
+# An announcement to play on the home page
 #announcement:
 #   class: success
 #   title: GTN Celebrates Pride Month

diff --git a/faqs/galaxy/datasets_add_tag.md b/faqs/galaxy/datasets_add_tag.md
@@ -4,13 +4,37 @@ description: Tags can help you to better organize your history and track dataset
 area: datasets
 layout: faq
 box_type: tip
-contributors: [bebatut,wm75,hexylena,shiltemann]
+contributors: [bebatut,wm75,hexylena,shiltemann,nekrut]
 ---
 
+Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces. 
+
+**To tag a dataset**:
+
 1. Click on the dataset to expand it
 2. Click on **Add Tags** {% icon galaxy-tags %}
-3. Add a tag {% if include.tag %}named `{{include.tag}}` {% else %} starting with `#` {% endif %}
-   - Tags starting with `#` will be automatically propagated to the outputs of tools using this dataset.
+3. Add {% if include.tag %} a tag named `{{include.tag}}`{% else %} tag text{% endif %}. Tags starting with `#` will be automatically propagated to the outputs of tools using this dataset (see below).
 4. Press <kbd>Enter</kbd>
 5. Check that the tag appears below the dataset name
 
+**Tags beginning with `#` are special!**
+
+They are called **Name tags**. The unique feature of these tags is that they *propagate*: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below).
+The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below): 
+
+1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using {% tool Bowtie2 %} generating dataset 3;
+1. dataset 3 is used to calculate read coverage using {% tool BedTools Genome Coverage %} *separately* for `+` and `-` strands. This generates two datasets (4 and 5 for plus and minus, respectively);
+1. datasets 4 and 5 are used as inputs to {% tool Macs2 broadCall %} datasets generating datasets 6 and 8;
+1. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using {% tool BedTools Intersect %} generating datasets 10 and 11.
+
+![A history without name tags versus history with name tags]({% link shared/images/histories_why_nametags.svg %})
+
+Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain "plus" data versus "minus" data. For example, does dataset 10 contain "plus" data or "minus" data? Probably "minus" but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.
+
+The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with `#plus` and `#minus`, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on... As a result it is straightforward to trace both branches (plus and minus) of this analysis. 
+
+More information is in a [dedicated #nametag tutorial]({% link topics/galaxy-interface/tutorials/name-tags/tutorial.md %}).
+
+
+<!-- Image is here = https://docs.google.com/drawings/d/1iiNsau6ddiE2MV9qMyekUq2mrpDHHcc02bXtcFEAnhY/edit?usp=sharing -->
+
diff --git a/faqs/galaxy/datasets_deleting.md b/faqs/galaxy/datasets_deleting.md
@@ -0,0 +1,41 @@
+---
+title: How to delete datasets?
+area: datasets
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+**Deleting datasets individually**
+
+To delete datasets individually simply click the {% icon galaxy-delete %} button with dataset's box. That's it! This action is reversible: datasets can be undeleted. 
+
+**Deleting datasets in bulk**
+
+To delete multiple datasets at once:
+
+- Click {% icon history-select-multiple %} icon at the top of the history pane;
+- Select datasets you want to delete;
+- Click the dropdown that would appear at the top of the history;
+- Select **"Delete"** option.
+
+This action is also reversible: datasets can be undeleted. 
+
+![An animated gif showing how to delete datasets]({% link shared/images/datasets_deleting.gif %})
+
+**Deleting datasets permanently** <font color="red">{% icon warning %} <b>Danger zone!</b></font>
+
+> <warning-title>Permanent is ... PERMANENT!</warning-title>
+> Datasets deleted in this fashion CANNOT be undeleted!
+{: .warning}
+
+To delete multiple datasets <font color="red">PERMANENTLY</font>:
+
+- Click {% icon history-select-multiple %} icon at the top of the history pane;
+- Select datasets you want to delete;
+- Click the dropdown that would appear at the top of the history;
+- Select **"Delete (permanently)"** option.
+
+
+
+
diff --git a/faqs/galaxy/datasets_hidden.md b/faqs/galaxy/datasets_hidden.md
@@ -1,15 +1,18 @@
 ---
-title: How to unhide "hidden datasets"?
+title: How to hide datasets?
 area: datasets
 box_type: tip
 layout: faq
-contributors: [jennaj, beachyesh]
+contributors: [jennaj, beachyesh, nekrut]
 ---
 
-If you have run a workflow with hidden datasets, in your History:
-- Click the **gear icon** {% icon galaxy-gear %} → Click **Unhide Hidden Datasets**
-- Or use the toggle ``hidden`` to view them
+To hide datasets:
+
+- Click {% icon history-select-multiple %} icon at the top of the history pane;
+- Select datasets you want to hide;
+- Click the dropdown that would appear at the top of the history;
+- Select **"Hide"** option.
+
+![An animated gif showing how to hide datasets]({% link shared/images/datasets_hide.gif %})
+
 
-When using the [Copy Datasets]({% link faqs/galaxy/histories_copy_dataset.md %}) feature, hidden datasets will not be available to transfer from the **Source History** list of datasets. To include them:
-1. Click the **gear icon** {% icon galaxy-gear %} → Click **Unhide Hidden Datasets**
-2. Click the **gear icon** {% icon galaxy-gear %} → Click **Copy Datasets** 
diff --git a/faqs/galaxy/datasets_multiple.md b/faqs/galaxy/datasets_multiple.md
@@ -0,0 +1,18 @@
+---
+title: Manipulating multiple history datasets
+description: Explains how to manipulate multiple history datasets at once
+area: histories
+layout: faq
+box_type: tip
+contributors: [nekrut]
+---
+
+You can also hide, delete, and purge multiple datasets at once by **multi-selecting datasets**:
+
+1. {% icon galaxy-selector %} Click the multi-select button containing the checkbox just below the history size.
+2. Checkboxes will appear inside each dataset in the history.
+3. Scroll and click the checkboxes next to the datasets you want to manage.
+4. Click the 'n of N selected' to choose the action. The action will be performed on all selected datasets, except for the ones that don't support the action. That is, if an action doesn't apply to a selected dataset, like deleting a deleted dataset, nothing will happen to that dataset, while all other selected datasets will be deleted.
+6. You can click the multi-select button again to hide the checkboxes.
+
+![Operating on multiple datasets]({% link faqs/galaxy/images/multiselect.gif %})
diff --git a/faqs/galaxy/datasets_undelete.md b/faqs/galaxy/datasets_undelete.md
@@ -0,0 +1,24 @@
+---
+title: How to un-delete datasets?
+area: datasets
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+If your history contains deleted datasets you will see {% icon galaxy-delete %} **"Include deleted"** button directly above dataset display.
+
+To un-delete datasets:
+
+- Type `deleted:true` in the search box
+- Select datasets you want to un-delete
+- Click the dropdown that would appear at the top of the history;
+- Select **"Undelete"** option.
+
+![An animated gif showing how to undelete datasets]({% link shared/images/datasets_undeleting.gif %})
+
+Alternatively, you can:
+
+- click {% icon galaxy-delete %} **"Include deleted"** button directly above dataset display. This will cause deleted datasets to appear in history along with normal (un-deleted) datasets;
+- deleted datasets are distinguished by having {% icon dataset-undelete %} within dataset box. Clicking on this icon will un-delete a given dataset;
+
diff --git a/faqs/galaxy/datasets_unhidden.md b/faqs/galaxy/datasets_unhidden.md
@@ -0,0 +1,24 @@
+---
+title: How to un-hide datasets?
+area: datasets
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+If your history contains hidden datasets you will see {% icon galaxy-show-hidden %} **"Include hidden"** button directly above the dataset display.
+
+To un-hide datasets:
+
+- Type `visible:hidden` in the search box
+- Select datasets you want to un-hide
+- Click the dropdown that would appear at the top of the history;
+- Select **"Unhide"** option.
+
+![An animated gif showing how to unhide datasets]({% link shared/images/datasets_unhide.gif %})
+
+Alternatively, you can:
+
+- click {% icon galaxy-show-hidden %} **"Include hidden"** button directly above dataset display. This will cause hidden datasets to appear in history along with normal (un-hidden) datasets;
+- hidden datasets are distinguished by having {% icon galaxy-show-hidden %} within dataset box. Clicking on this icon will un-hide a given dataset;
+
diff --git a/faqs/galaxy/histories_annotation.md b/faqs/galaxy/histories_annotation.md
@@ -0,0 +1,24 @@
+---
+title: History annotation
+description: Explains how to annotate a history
+area: histories
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+Sometimes tags and names are not enough to describe the work done within a history. Galaxy allows you to create history
+annotations: longer text entries that allow for more formatting options. The formatting of the text is preserved. Later, if
+you publish or share the history, the annotation will be displayed automatically - allowing you to share additional
+notes about the analysis. Multiple lines, spaces, and emoji! 😹🏳️‍⚧️🌈 can be used while writing annotations. 
+
+To annotate a history:
+
+1. Click on {% icon galaxy-pencil %} (**Edit**) next to the history name. A larger text section will appear displaying any
+  existing annotation or `Annotation (optional)` if empty.
+2. Add your text. <kbd>Enter</kbd> will move the cursor to the next line. (Tabs cannot be
+  entered since the 'Tab' button is used to switch between controls on the page - tabs can be pasted in, however).
+3. Click on **Save** {% icon galaxy-save %}.
+4. To cancel, click the {% icon galaxy-undo %} "Cancel" button.
+
+![UI for annotating histories]({% link shared/images/history_annotations.png %})
diff --git a/faqs/galaxy/histories_create_new.md b/faqs/galaxy/histories_create_new.md
@@ -7,9 +7,8 @@ layout: faq
 contributors: [bebatut,wm75,shiltemann,hexylena,nomadscientist,nsoranzo,nekrut]
 ---
 
-Click the {% icon new-history %} icon at the top of the history panel:
+To create a new history simply click the {% icon new-history %} icon at the top of the history panel:
 
 ![UI for creating new history]({% link shared/images/history_create_new.svg %})
 
-
-<!-- the original drawing can be found here https://docs.google.com/drawings/d/1cCBrLAo4kDGic5QyB70rRiWJAKTenTU8STsKDaLcVU8/edit?usp=sharing -->
+<!-- the original drawing can be found here https://docs.google.com/drawings/d/1cCBrLAo4kDGic5QyB70rRiWJAKTenTU8STsKDaLcVU8/edit?usp=sharing -->
diff --git a/faqs/galaxy/histories_dataset_colors.md b/faqs/galaxy/histories_dataset_colors.md
@@ -0,0 +1,33 @@
+---
+title: Dataset colors
+description: Explains meaning of dataset colors in Galaxy's history
+area: histories
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+There are several different "states" a dataset can be in. These states are indicated by colors:
+
+![Colors indicating states of Galaxy datasets]({% link shared/images/galactic_colors.svg %})
+
+- **ok**: everything is fine, life is good;
+- **new**: the dataset was just created. Galaxy does not yet know when it is;
+- **queued**: indicates that the job generating this dataset is scheduled for execution but not running yet;
+- **running**: job generating this dataset is running;
+- **setting metadata**: when a new dataset is uploaded Galaxy examines it to understand what kind of data it is (e.g., BAM, FASTQ, fasta, BED, etc.). This is called "setting metadata";
+- **deferred**: sometimes it does not make sense to upload the dataset until it is needed for an analysis. Galaxy will download **deferred** datasets later during the job execution. Those datasets do not count toward your quota;
+- **paused**: in some cases as, for example, workflow executions, upstream errors prevent subsequent jobs from starting creating datasets in "paused" state; 
+- **discarded**: something went wrong such as, for example, a job producing this dataset might have been cancelled;
+- **error**: everything is not fine; life is bad!
+- **placeholder**: similar to "new"; we know something will be there but are not yet sure what;
+- **failed populated state**: this refers to collections (not individual datasets). Here a collection has failed to be populated with datasets;
+- **new populated state**: this refers to collections (not individual datasets). A collection was created but not populated yet.
+
+<!-- original editable image = https://docs.google.com/drawings/d/1F2Lq1m3cMIckvCexXMzug-dqwgifkoMOzoyH4VcoVX0/edit?usp=sharing -->
+
+<!-- TO DO 
+Needs to be linked to FAQs on:
+- how to report errors
+- explaining collections 
+-->
diff --git a/faqs/galaxy/histories_dataset_item.md b/faqs/galaxy/histories_dataset_item.md
@@ -0,0 +1,52 @@
+---
+title: Dataset snippet
+description: Describes features of a single dataset element in the history
+area: histories
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+A single Galaxy dataset can either be "collapsed" or "expanded".
+
+**Collapsed dataset view**
+
+Datasets in the panel are initially shown in a "collapsed" view:
+
+![Collapsed view of a single Galaxy dataset]({% link shared/images/history_item_collapsed.png %})
+
+It contains the following elements:
+
+- **Dataset number**: ("1") order of dataset in the history;
+- **Dataset name**: ("M117-bl_1.fq.gz") its name;
+- {% icon galaxy-eye %}: click this to view the dataset contents;
+- {% icon galaxy-pencil %}: click this to edit dataset properties;
+- {% icon galaxy-delete %}: click this to delete the dataset from the history (*don't worry*, you can undo this action!).
+
+Clicking on a collapsed dataset will expand it.
+
+> <details-title>Some buttons can be disabled.</details-title>
+> Some of the buttons above may be disabled if the dataset is in a state that doesn't allow the
+> action. For example, the 'edit' button is disabled for datasets that are still queued or running
+>
+{: .details}
+
+**Expanded dataset view**
+
+Expanded dataset view adds a preview element and many additional controls. 
+
+![Expanded view of a single Galaxy dataset]({% link shared/images/history_item_expanded.png %})
+
+In addition to the elements described above for the collapsed dataset, its expanded view contains:
+
+- **Add tags** {% icon galaxy-tags %}: click on this to tag this dateset;
+- **Dataset size**: ("2 variants, 18 comments") lists the size of the dataset. When datasets are small (like in this example) the exact size is shown. For large datasets, Galaxy gives an approximate estimate.
+- **format**: ("VCF") lists the datatype;
+- **database**: ("?") lists which genome built this dataset corresponds to. This usually lists "?" unless the genome build is set explicitly or the dataset is derived from another dataset with defined genome build information;
+- **info field**: ("INFO [2024-03-26 12:08:53,435]...") displays information provided by the tool that generated this dataset. This varies widely and depends on the type of job that generated this dataset.
+- {% icon dataset-save %}: Saves dataset to disk;
+- {% icon dataset-link %}: Copies dataset link into clipboard;
+- {% icon dataset-info %}: Displays additional details about the dataset in the center pane;
+- {% icon dataset-rerun %}: Reruns job that generated this dataset. This button is unavailable for datasets uploaded into history because they were not produced by a Galaxy tool;
+- {% icon dataset-visualize %}: Displays visualization options for this dataset. The list of options is dependent on the datatype;
+- {% icon dataset-related-datasets %}: Shows datasets related to this dataset. This is useful for tracking down parental datasets - those that were used as inputs into a job that produced this particular dataset.
diff --git a/faqs/galaxy/histories_datasets_vs_collections.md b/faqs/galaxy/histories_datasets_vs_collections.md
@@ -0,0 +1,33 @@
+---
+title: Datasets versus collections
+description: Explanation of why collections are needed and what they are
+area: collections, histories
+box_type: tip
+layout: faq
+contributors: [nekrut]
+---
+
+**Datasets versus collections**
+
+In Galaxy's history datasets can be present as individual entries or they can be combined into *Collections*. Why do we need collections? Collections combine multiple individual
+datasets into a single entity which is easy to manage. Galaxy tools can use collections directly as inputs. Collection can be **simple** or **nested**.
+
+**Simple collections**
+
+Imagine that you've uploaded a hundred FASTQ files corresponding to a hundred samples. These will appear as a hundred individual datasets in your history making it very long.
+But the chances are that when you analyze these data you will do the same thing on each dataset.
+
+To simplify this process you can combine all hundred datasets into a single entity called a *dataset collection* (or simply a *collection* or a *list*). It will appear as a single box in your history making it much easier to understand. Galaxy tools are designed to take collections as inputs. So, for example, if you want to map each of these datasets against a reference genome using, say, {% tool Minimap2 %}, you will need to provide `minmap2` with just one input, the collection, and it will automatically start 100 jobs behind the scenes and will combine all outputs into a single collection containing BAM files.
+
+![A simple collection is a container containing individual datasets]({% link shared/images/simple_collection.svg %})
+
+There is a number of situations when simple collections are not sufficient to reflect the complexity of the data. To deal with this situation Galaxy allows for **nested** collections.
+
+**Nested collections**
+
+Probably the most common example of this is pared end data when each sample is represented by two files: one containing forward reads and another containing reverse reads. In Galaxy you can create **nested** collection that reflects the hierarchy of the data. In the case of paired data Galaxy supports **paired** collections.
+
+![A paired collection is a container containing individual datasets and preserving their hierarchy]({% link shared/images/paired_collection.svg %})
+
+<!-- Original editable image for simple collections = https://docs.google.com/drawings/d/1A-tRerNLzC4FJfShUFT327wMvSSX4Y8AAdaxD_Fwaa0/edit?usp=sharing -->
+<!-- Original editable image for paired collection = https://docs.google.com/drawings/d/1Bbx4UmIYdDAqK3KSm6LtLQ8zwEXxDglHmVbb0MK-mbQ/edit?usp=sharing -->