Skip to content

Commit

Permalink
Merge pull request #4846 from nekrut/history_tutorial_update
Browse files Browse the repository at this point in the history
History tutorial update
  • Loading branch information
nekrut authored Jul 2, 2024
2 parents 3e2fb66 + 9a7bb99 commit b81bd55
Show file tree
Hide file tree
Showing 41 changed files with 473 additions and 164 deletions.
15 changes: 14 additions & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -137,7 +137,11 @@ icon-tag:
galaxy-eye: far fa-eye
galaxy-gear: fas fa-cog
galaxy-history: fas fa-columns
galaxy-dataset-collapse: fa fa-compress
galaxy-history-archive: fas fa-archive
galaxy-history-size: fas fa-database
galaxy-history-storage-choice: fas fa-hard-drive
galaxy-history-refresh: fas fa-arrows-rotate
galaxy-history-input: fas fa-sign-in-alt
galaxy-history-answer: fas fa-sign-out-alt
galaxy-home: fas fa-home
Expand Down Expand Up @@ -173,6 +177,7 @@ icon-tag:
help: far fa-question-circle
history-annotate: fas fa-comment
history-share: fas fa-share-alt
history-select-multiple: fas fa-check-square
instances: fas fa-globe
interactive_tour: fas fa-magic
keypoints: fas fa-key
Expand Down Expand Up @@ -237,6 +242,14 @@ icon-tag:
zenodo_link: far fa-copy
version: fas fa-code-commit
rating: far fa-star
dataset-rerun: fas fa-arrow-rotate-right
dataset-related-datasets: fas fa-sitemap
dataset-visualize: fas fa-chart-bar
dataset-save: fas fa-save
dataset-link: fas fa-link
dataset-question: fas fa-question
dataset-info: fas fa-info-circle
dataset-undelete: fas fa-trash-can-arrow-up

# To exclude in _site
exclude:
Expand Down Expand Up @@ -279,7 +292,7 @@ plugins:
- jekyll-feed
- jekyll-redirect-from

# An announcement to display on the home page
# An announcement to play on the home page
#announcement:
# class: success
# title: GTN Celebrates Pride Month
Expand Down
30 changes: 27 additions & 3 deletions faqs/galaxy/datasets_add_tag.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,13 +4,37 @@ description: Tags can help you to better organize your history and track dataset
area: datasets
layout: faq
box_type: tip
contributors: [bebatut,wm75,hexylena,shiltemann]
contributors: [bebatut,wm75,hexylena,shiltemann,nekrut]
---

Datasets can be tagged. This simplifies the tracking of datasets across the Galaxy interface. Tags can contain any combination of letters or numbers but cannot contain spaces.

**To tag a dataset**:

1. Click on the dataset to expand it
2. Click on **Add Tags** {% icon galaxy-tags %}
3. Add a tag {% if include.tag %}named `{{include.tag}}` {% else %} starting with `#` {% endif %}
- Tags starting with `#` will be automatically propagated to the outputs of tools using this dataset.
3. Add {% if include.tag %} a tag named `{{include.tag}}`{% else %} tag text{% endif %}. Tags starting with `#` will be automatically propagated to the outputs of tools using this dataset (see below).
4. Press <kbd>Enter</kbd>
5. Check that the tag appears below the dataset name

**Tags beginning with `#` are special!**

They are called **Name tags**. The unique feature of these tags is that they *propagate*: if a dataset is labelled with a name tag, all derivatives (children) of this dataset will automatically inherit this tag (see below).
The figure below explains why this is so useful. Consider the following analysis (numbers in parenthesis correspond to dataset numbers in the figure below):

1. a set of forward and reverse reads (datasets 1 and 2) is mapped against a reference using {% tool Bowtie2 %} generating dataset 3;
1. dataset 3 is used to calculate read coverage using {% tool BedTools Genome Coverage %} *separately* for `+` and `-` strands. This generates two datasets (4 and 5 for plus and minus, respectively);
1. datasets 4 and 5 are used as inputs to {% tool Macs2 broadCall %} datasets generating datasets 6 and 8;
1. datasets 6 and 8 are intersected with coordinates of genes (dataset 9) using {% tool BedTools Intersect %} generating datasets 10 and 11.

![A history without name tags versus history with name tags]({% link shared/images/histories_why_nametags.svg %})

Now consider that this analysis is done without name tags. This is shown on the left side of the figure. It is hard to trace which datasets contain "plus" data versus "minus" data. For example, does dataset 10 contain "plus" data or "minus" data? Probably "minus" but are you sure? In the case of a small history like the one shown here, it is possible to trace this manually but as the size of a history grows it will become very challenging.

The right side of the figure shows exactly the same analysis, but using name tags. When the analysis was conducted datasets 4 and 5 were tagged with `#plus` and `#minus`, respectively. When they were used as inputs to Macs2 resulting datasets 6 and 8 automatically inherited them and so on... As a result it is straightforward to trace both branches (plus and minus) of this analysis.

More information is in a [dedicated #nametag tutorial]({% link topics/galaxy-interface/tutorials/name-tags/tutorial.md %}).


<!-- Image is here = https://docs.google.com/drawings/d/1iiNsau6ddiE2MV9qMyekUq2mrpDHHcc02bXtcFEAnhY/edit?usp=sharing -->

41 changes: 41 additions & 0 deletions faqs/galaxy/datasets_deleting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
title: How to delete datasets?
area: datasets
box_type: tip
layout: faq
contributors: [nekrut]
---

**Deleting datasets individually**

To delete datasets individually simply click the {% icon galaxy-delete %} button with dataset's box. That's it! This action is reversible: datasets can be undeleted.

**Deleting datasets in bulk**

To delete multiple datasets at once:

- Click {% icon history-select-multiple %} icon at the top of the history pane;
- Select datasets you want to delete;
- Click the dropdown that would appear at the top of the history;
- Select **"Delete"** option.

This action is also reversible: datasets can be undeleted.

![An animated gif showing how to delete datasets]({% link shared/images/datasets_deleting.gif %})

**Deleting datasets permanently** <font color="red">{% icon warning %} <b>Danger zone!</b></font>

> <warning-title>Permanent is ... PERMANENT!</warning-title>
> Datasets deleted in this fashion CANNOT be undeleted!
{: .warning}

To delete multiple datasets <font color="red">PERMANENTLY</font>:

- Click {% icon history-select-multiple %} icon at the top of the history pane;
- Select datasets you want to delete;
- Click the dropdown that would appear at the top of the history;
- Select **"Delete (permanently)"** option.




19 changes: 11 additions & 8 deletions faqs/galaxy/datasets_hidden.md
Original file line number Diff line number Diff line change
@@ -1,15 +1,18 @@
---
title: How to unhide "hidden datasets"?
title: How to hide datasets?
area: datasets
box_type: tip
layout: faq
contributors: [jennaj, beachyesh]
contributors: [jennaj, beachyesh, nekrut]
---

If you have run a workflow with hidden datasets, in your History:
- Click the **gear icon** {% icon galaxy-gear %} → Click **Unhide Hidden Datasets**
- Or use the toggle ``hidden`` to view them
To hide datasets:

- Click {% icon history-select-multiple %} icon at the top of the history pane;
- Select datasets you want to hide;
- Click the dropdown that would appear at the top of the history;
- Select **"Hide"** option.

![An animated gif showing how to hide datasets]({% link shared/images/datasets_hide.gif %})


When using the [Copy Datasets]({% link faqs/galaxy/histories_copy_dataset.md %}) feature, hidden datasets will not be available to transfer from the **Source History** list of datasets. To include them:
1. Click the **gear icon** {% icon galaxy-gear %} → Click **Unhide Hidden Datasets**
2. Click the **gear icon** {% icon galaxy-gear %} → Click **Copy Datasets**
18 changes: 18 additions & 0 deletions faqs/galaxy/datasets_multiple.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
title: Manipulating multiple history datasets
description: Explains how to manipulate multiple history datasets at once
area: histories
layout: faq
box_type: tip
contributors: [nekrut]
---

You can also hide, delete, and purge multiple datasets at once by **multi-selecting datasets**:

1. {% icon galaxy-selector %} Click the multi-select button containing the checkbox just below the history size.
2. Checkboxes will appear inside each dataset in the history.
3. Scroll and click the checkboxes next to the datasets you want to manage.
4. Click the 'n of N selected' to choose the action. The action will be performed on all selected datasets, except for the ones that don't support the action. That is, if an action doesn't apply to a selected dataset, like deleting a deleted dataset, nothing will happen to that dataset, while all other selected datasets will be deleted.
6. You can click the multi-select button again to hide the checkboxes.

![Operating on multiple datasets]({% link faqs/galaxy/images/multiselect.gif %})
24 changes: 24 additions & 0 deletions faqs/galaxy/datasets_undelete.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: How to un-delete datasets?
area: datasets
box_type: tip
layout: faq
contributors: [nekrut]
---

If your history contains deleted datasets you will see {% icon galaxy-delete %} **"Include deleted"** button directly above dataset display.

To un-delete datasets:

- Type `deleted:true` in the search box
- Select datasets you want to un-delete
- Click the dropdown that would appear at the top of the history;
- Select **"Undelete"** option.

![An animated gif showing how to undelete datasets]({% link shared/images/datasets_undeleting.gif %})

Alternatively, you can:

- click {% icon galaxy-delete %} **"Include deleted"** button directly above dataset display. This will cause deleted datasets to appear in history along with normal (un-deleted) datasets;
- deleted datasets are distinguished by having {% icon dataset-undelete %} within dataset box. Clicking on this icon will un-delete a given dataset;

24 changes: 24 additions & 0 deletions faqs/galaxy/datasets_unhidden.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: How to un-hide datasets?
area: datasets
box_type: tip
layout: faq
contributors: [nekrut]
---

If your history contains hidden datasets you will see {% icon galaxy-show-hidden %} **"Include hidden"** button directly above the dataset display.

To un-hide datasets:

- Type `visible:hidden` in the search box
- Select datasets you want to un-hide
- Click the dropdown that would appear at the top of the history;
- Select **"Unhide"** option.

![An animated gif showing how to unhide datasets]({% link shared/images/datasets_unhide.gif %})

Alternatively, you can:

- click {% icon galaxy-show-hidden %} **"Include hidden"** button directly above dataset display. This will cause hidden datasets to appear in history along with normal (un-hidden) datasets;
- hidden datasets are distinguished by having {% icon galaxy-show-hidden %} within dataset box. Clicking on this icon will un-hide a given dataset;

24 changes: 24 additions & 0 deletions faqs/galaxy/histories_annotation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: History annotation
description: Explains how to annotate a history
area: histories
box_type: tip
layout: faq
contributors: [nekrut]
---

Sometimes tags and names are not enough to describe the work done within a history. Galaxy allows you to create history
annotations: longer text entries that allow for more formatting options. The formatting of the text is preserved. Later, if
you publish or share the history, the annotation will be displayed automatically - allowing you to share additional
notes about the analysis. Multiple lines, spaces, and emoji! 😹🏳️‍⚧️🌈 can be used while writing annotations.

To annotate a history:

1. Click on {% icon galaxy-pencil %} (**Edit**) next to the history name. A larger text section will appear displaying any
existing annotation or `Annotation (optional)` if empty.
2. Add your text. <kbd>Enter</kbd> will move the cursor to the next line. (Tabs cannot be
entered since the 'Tab' button is used to switch between controls on the page - tabs can be pasted in, however).
3. Click on **Save** {% icon galaxy-save %}.
4. To cancel, click the {% icon galaxy-undo %} "Cancel" button.

![UI for annotating histories]({% link shared/images/history_annotations.png %})
5 changes: 2 additions & 3 deletions faqs/galaxy/histories_create_new.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@ layout: faq
contributors: [bebatut,wm75,shiltemann,hexylena,nomadscientist,nsoranzo,nekrut]
---

Click the {% icon new-history %} icon at the top of the history panel:
To create a new history simply click the {% icon new-history %} icon at the top of the history panel:

![UI for creating new history]({% link shared/images/history_create_new.svg %})


<!-- the original drawing can be found here https://docs.google.com/drawings/d/1cCBrLAo4kDGic5QyB70rRiWJAKTenTU8STsKDaLcVU8/edit?usp=sharing -->
<!-- the original drawing can be found here https://docs.google.com/drawings/d/1cCBrLAo4kDGic5QyB70rRiWJAKTenTU8STsKDaLcVU8/edit?usp=sharing -->
33 changes: 33 additions & 0 deletions faqs/galaxy/histories_dataset_colors.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Dataset colors
description: Explains meaning of dataset colors in Galaxy's history
area: histories
box_type: tip
layout: faq
contributors: [nekrut]
---

There are several different "states" a dataset can be in. These states are indicated by colors:

![Colors indicating states of Galaxy datasets]({% link shared/images/galactic_colors.svg %})

- **ok**: everything is fine, life is good;
- **new**: the dataset was just created. Galaxy does not yet know when it is;
- **queued**: indicates that the job generating this dataset is scheduled for execution but not running yet;
- **running**: job generating this dataset is running;
- **setting metadata**: when a new dataset is uploaded Galaxy examines it to understand what kind of data it is (e.g., BAM, FASTQ, fasta, BED, etc.). This is called "setting metadata";
- **deferred**: sometimes it does not make sense to upload the dataset until it is needed for an analysis. Galaxy will download **deferred** datasets later during the job execution. Those datasets do not count toward your quota;
- **paused**: in some cases as, for example, workflow executions, upstream errors prevent subsequent jobs from starting creating datasets in "paused" state;
- **discarded**: something went wrong such as, for example, a job producing this dataset might have been cancelled;
- **error**: everything is not fine; life is bad!
- **placeholder**: similar to "new"; we know something will be there but are not yet sure what;
- **failed populated state**: this refers to collections (not individual datasets). Here a collection has failed to be populated with datasets;
- **new populated state**: this refers to collections (not individual datasets). A collection was created but not populated yet.

<!-- original editable image = https://docs.google.com/drawings/d/1F2Lq1m3cMIckvCexXMzug-dqwgifkoMOzoyH4VcoVX0/edit?usp=sharing -->

<!-- TO DO
Needs to be linked to FAQs on:
- how to report errors
- explaining collections
-->
52 changes: 52 additions & 0 deletions faqs/galaxy/histories_dataset_item.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
---
title: Dataset snippet
description: Describes features of a single dataset element in the history
area: histories
box_type: tip
layout: faq
contributors: [nekrut]
---

A single Galaxy dataset can either be "collapsed" or "expanded".

**Collapsed dataset view**

Datasets in the panel are initially shown in a "collapsed" view:

![Collapsed view of a single Galaxy dataset]({% link shared/images/history_item_collapsed.png %})

It contains the following elements:

- **Dataset number**: ("1") order of dataset in the history;
- **Dataset name**: ("M117-bl_1.fq.gz") its name;
- {% icon galaxy-eye %}: click this to view the dataset contents;
- {% icon galaxy-pencil %}: click this to edit dataset properties;
- {% icon galaxy-delete %}: click this to delete the dataset from the history (*don't worry*, you can undo this action!).

Clicking on a collapsed dataset will expand it.

> <details-title>Some buttons can be disabled.</details-title>
> Some of the buttons above may be disabled if the dataset is in a state that doesn't allow the
> action. For example, the 'edit' button is disabled for datasets that are still queued or running
>
{: .details}

**Expanded dataset view**

Expanded dataset view adds a preview element and many additional controls.

![Expanded view of a single Galaxy dataset]({% link shared/images/history_item_expanded.png %})

In addition to the elements described above for the collapsed dataset, its expanded view contains:

- **Add tags** {% icon galaxy-tags %}: click on this to tag this dateset;
- **Dataset size**: ("2 variants, 18 comments") lists the size of the dataset. When datasets are small (like in this example) the exact size is shown. For large datasets, Galaxy gives an approximate estimate.
- **format**: ("VCF") lists the datatype;
- **database**: ("?") lists which genome built this dataset corresponds to. This usually lists "?" unless the genome build is set explicitly or the dataset is derived from another dataset with defined genome build information;
- **info field**: ("INFO [2024-03-26 12:08:53,435]...") displays information provided by the tool that generated this dataset. This varies widely and depends on the type of job that generated this dataset.
- {% icon dataset-save %}: Saves dataset to disk;
- {% icon dataset-link %}: Copies dataset link into clipboard;
- {% icon dataset-info %}: Displays additional details about the dataset in the center pane;
- {% icon dataset-rerun %}: Reruns job that generated this dataset. This button is unavailable for datasets uploaded into history because they were not produced by a Galaxy tool;
- {% icon dataset-visualize %}: Displays visualization options for this dataset. The list of options is dependent on the datatype;
- {% icon dataset-related-datasets %}: Shows datasets related to this dataset. This is useful for tracking down parental datasets - those that were used as inputs into a job that produced this particular dataset.
33 changes: 33 additions & 0 deletions faqs/galaxy/histories_datasets_vs_collections.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
---
title: Datasets versus collections
description: Explanation of why collections are needed and what they are
area: collections, histories
box_type: tip
layout: faq
contributors: [nekrut]
---

**Datasets versus collections**

In Galaxy's history datasets can be present as individual entries or they can be combined into *Collections*. Why do we need collections? Collections combine multiple individual
datasets into a single entity which is easy to manage. Galaxy tools can use collections directly as inputs. Collection can be **simple** or **nested**.

**Simple collections**

Imagine that you've uploaded a hundred FASTQ files corresponding to a hundred samples. These will appear as a hundred individual datasets in your history making it very long.
But the chances are that when you analyze these data you will do the same thing on each dataset.

To simplify this process you can combine all hundred datasets into a single entity called a *dataset collection* (or simply a *collection* or a *list*). It will appear as a single box in your history making it much easier to understand. Galaxy tools are designed to take collections as inputs. So, for example, if you want to map each of these datasets against a reference genome using, say, {% tool Minimap2 %}, you will need to provide `minmap2` with just one input, the collection, and it will automatically start 100 jobs behind the scenes and will combine all outputs into a single collection containing BAM files.

![A simple collection is a container containing individual datasets]({% link shared/images/simple_collection.svg %})

There is a number of situations when simple collections are not sufficient to reflect the complexity of the data. To deal with this situation Galaxy allows for **nested** collections.

**Nested collections**

Probably the most common example of this is pared end data when each sample is represented by two files: one containing forward reads and another containing reverse reads. In Galaxy you can create **nested** collection that reflects the hierarchy of the data. In the case of paired data Galaxy supports **paired** collections.

![A paired collection is a container containing individual datasets and preserving their hierarchy]({% link shared/images/paired_collection.svg %})

<!-- Original editable image for simple collections = https://docs.google.com/drawings/d/1A-tRerNLzC4FJfShUFT327wMvSSX4Y8AAdaxD_Fwaa0/edit?usp=sharing -->
<!-- Original editable image for paired collection = https://docs.google.com/drawings/d/1Bbx4UmIYdDAqK3KSm6LtLQ8zwEXxDglHmVbb0MK-mbQ/edit?usp=sharing -->
Loading

0 comments on commit b81bd55

Please sign in to comment.