Skip to content

Commit

Permalink
galaxy update
Browse files Browse the repository at this point in the history
  • Loading branch information
SamueleSoraggi committed Jun 28, 2024
1 parent 16af6f4 commit be2f169
Showing 1 changed file with 22 additions and 17 deletions.
39 changes: 22 additions & 17 deletions galaxy/galaxy.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -64,29 +64,36 @@ Through `Galaxy`, we build a workflow applying tools to the data. We will look a

![](./images/tools.png)

:::{.callout-warning title="Use the search bat"}

There are quite many tools in Galaxy. Please use the search bar above the toolbar to find quickly the programs we need.

:::


#### Quality control

**1** Run FastQC on the PacBio Hifi reads and on two of the Illumina RNA-seq libraries. FastQC does quality control of the raw sequence data, providing an overview of the data which can help identify if there are any problems that should be addressed before further analysis.

In the tool menu, click on `FASTQ quality control --> FASTQC read quality reports`. You will see a window with *tool parameters*: for the first option (raw read data from history), choose multiple files and select `Hifi_reads_white_clover.fastq` plus other `fastq` files you want to see the quality of (example in figure below). Then click on the button `Run Tool`.
In the tool searchbar, find `FastQC` and choose `FASTQC read quality reports`. You will see a window with *tool parameters*: for the first option (raw read data from your current history), choose multiple files and select `Hifi_reads_white_clover.fastq` plus other `fastq` files you want to see the quality of (example in figure below). Then click on the button `Run Tool`.
![](./images/fastqc.png)

You will notice that some new elements are added to your history. Part of them are `FastQC` producing a text file, while others are `FastQC` producing a webpage report. The reports are ready when coloured in green: click on the *eye symbol* of a history item to read a report.

**2** FastQC provides a report for each sample. To have a better comparison between
the *Hifi* and *Illumina* data, we would combine the three `FastQC` reports into one using `MultiQC`.

Choose the MultiQC tool from `FASTQ quality control --> MultiQC aggregate results from ...`. In the options, select `FastQC` as the used tool for the logsselect FastQC as the tool used to generate the output, and then select the items of `FastQC` of your history producing `RawData` (Figure below). In this way, you build a pipeline from the previous reports to the new tool you are using. Now click on `Run Tool`.
Find the tool called `MultiQC aggregate results from [...]`. In the options, select `FastQC` as the used tool for the generated logs. Then select the items of `FastQC` of your history producing `RawData` (Figure below). In this way, you build a pipeline from the previous reports to the new tool you are using. Now click on `Run Tool`.

![](./images/multiqc.png)

The tool will be now running in your history. When it is done, click on the *eye symbol* to see the report.

select the three “RawData” outputs generated by FastQC. Visualize the Webpage generated by
MultiQC.
Hint: You can find a “Help” button that offers additional information about the plots for each panel.
:::{.callout-tip}

You can find a “Help” button that offers additional information about the plots for each panel.

:::

<summary>Questions:
<p>
Expand All @@ -97,22 +104,20 @@ Hint: You can find a “Help” button that offers additional information about

#### Hifi Data Alignment

**3** Map the PacBio Hifi reads (`Hifi_reads_white_clover.fastq`) to the white clover reference sequence (Contigs 1 and 2) using `minimap2` (Map with minimap2). Find `Genomics Analysis --> Mapping --> Map with minimap2`. In the options, do not leave `Use a built-in genome index`, but select the option for having a genome from history. Choose then `DNA_Contig1_2.fasta` as the reference sequence.

Under the profile with preset options, choose `PacBio/Oxford Nanopore read to reference mapping (map-pb)`. Then click on `Run Tool`.
**3** Map the PacBio Hifi reads (`Hifi_reads_white_clover.fastq`) to the white clover reference sequence (Contigs 1 and 2) using `minimap2` (Map with minimap2). Find the `minimap2` tool. In the options, change `Use a built-in genome index` into but select the option for having a genome from history and build an index from that. Choose then `DNA_Contig1_2.fasta` as the reference sequence.

![](./images/map-pb.png)
Under the profile with preset options, choose `PacBio HiFi reads vs reference mapping (map-hifi)`. Then click on `Run Tool`.

**4** Run the same alignment, but choose as preset options `Long assembly to reference mapping. Divergence is far below 20% (asm20)`.
**4** Run the same alignment, but choose as preset options `Long assembly to reference mapping. Up to 20% divergence (asm20)`.


Rename then the two alignments using the edit function (*pen symbol* in the history). Use for example names `Contig1_2_mappb` and `Contig1_2_asm20`, to distinguish alignment options and reference genome.
Rename then the two alignments using the edit function (*pen symbol* in the history). Use for example names `Contig1_2_maphifi` and `Contig1_2_asm20`, to distinguish alignment options and reference genome.

**5** The aligned genomes are not sorted by coordinates. Sort the alignments using `Samtools sort` (Find the tool under `Genomic file manipulation --> SAM/BAM --> Samtools sort ...`). In the options, choose the two aligned files with multiple selection. Then click on `Run Tool`.
**5** The aligned genomes are not sorted by coordinates. Sort the alignments using `Samtools sort`. In the options, choose the two aligned files with multiple selection. Then click on `Run Tool`.

**6** Download the two alignments to your computer. To do so, click on the *disk symbol* of each file in your history, and for each download both the Dataset (alignments in `bam` format) and their index files (in `bai` format). Download as well the reference genome in `fasta` format (`DNA_Contig1_2.fasta` from the history).

**7** Open IGV on your computer. Load the reference first: go on `Genome --> Load genome from file` and select the `fasta` file you downloaded. Then load the two alignments: go on `File --> Load from file` and select the `bam` and `bai` files you downloaded, together. You can now visualize the alignments.
**7** Open IGV on your computer. Load the reference first: go on `Genome --> Load genome from file` and select the `fasta` file you downloaded. Then load the two alignments: go on `File --> Load from file` and select the `bam` and `bai` files you downloaded, together. You can now visualize the alignments by choosing a region of the genome and zooming in.

![](./images/IGV.png)

Expand Down Expand Up @@ -155,14 +160,14 @@ Your sequences will be substituted by two elements in your history. Here we chos

![](./images/builtlist.png)

**10** Do alignment of the RNA-seq lists of raw files to the reference `DNA_Contig1_2.fasta` using `STAR`. Go to `Genomics analysis --> RNA-seq --> RNA STAR Gapped-read mapper for RNA-seq data`. In the options use:
**10** Do alignment of the RNA-seq lists of raw files to the reference `DNA_Contig1_2.fasta` using `STAR Gapped-read mapper for RNA-seq data`. In the options use:

- as data, the parameter `Paired-end (as collection)`, and then choose one of the two collections (you cannot run them all at once)
- as reference, `DNA_Contig1_2.fasta`, with Length of SA pre-indexing string equal to `9`
- as index with gene-model, use `white_clover_annotations.gtf`
- as index with gene-model, use `white_clover_genes.gtf`
- as output, `Per gene read counts (GeneCounts)`.

**11** Use `MultiQC` to see the quality of the output. The alignment of `STAR` produces log files which can be used for quality reports. Go on `Genomic File Manipulation --> MultiQC`. In the options select the tool `STAR`. Then `Insert STAR output`, as type of output the `Log`, and choose the two logs listing collections of `STAR` alignments. Then click on `Run Tool`.
**11** Use `MultiQC` to see the quality of the output. The alignment of `STAR` produces log files which can be used for quality reports. In the options of `MultiQC` select the tool `STAR`. Then `Insert STAR output`, as type of output the `Log`, and choose the two logs listing collections of `STAR` alignments. Then click on `Run Tool`.

![](./images/starMultiQC.png)

Expand All @@ -171,7 +176,7 @@ View the report to see the alignment statistics.

:::{.callout-note}

`Galaxy can also be used to create an automatic workflow that will map the data. This workflow can be useful when running multiple samples. You can
Galaxy can also be used to create an automatic workflow that will map the data. This workflow can be useful when running multiple samples. You can
generate a workflow from the analysis already completed in a history, by going to Settings →
Extract workflow. You can also create a workflow from scratch using the Workflow editor.

Expand Down

0 comments on commit be2f169

Please sign in to comment.