-
Notifications
You must be signed in to change notification settings - Fork 68
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
applying all comments from Marius, changing all Readme's accordingly,…
… and changing the Allele based workflow to be general not specific to Nanopore
- Loading branch information
Showing
13 changed files
with
38 additions
and
102 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,89 +1,30 @@ | ||
# Microbiome Workflows | ||
|
||
The following workflows can be used directly for microbiome data analysis, pathogen detection, and tracking purposes. The workflows can also be adapted to any other sequencing technique. | ||
In this directory, you will find a collection of workflows designed for microbiome data analysis, pathogen detection, and tracking. These workflows are ready to use and can be adapted for various sequencing techniques using Galaxy's customizable and automatable API. | ||
|
||
To learn more about the following workflows and try them with real datasets, please check out our Microbiome tutorials on the Galaxy Training Network [GTN](https://training.galaxyproject.org/training-material/topics/microbiome/) | ||
## Avaiable Workflows | ||
|
||
## Nanopore _Preprocessing_ | ||
- **Nanopore Preprocessing** | ||
|
||
Before starting any analysis, it is always a good idea to assess the quality of your input data and to discard poor-quality base content by trimming and filtering reads. | ||
- **Taxonomy Profiling and Visualisation with Krona** | ||
|
||
Generally, we are not interested in the host sequences, but rather only those originating from the pathogen itself. It is important to get rid of all host sequences and to only retain sequences that might include a pathogen, both in order to speed up further steps and to avoid host sequences compromising the analysis. | ||
- **Gene-based Pathogen Identification** | ||
|
||
### Input Datasets | ||
- **Allele-based Pathogen Identification** | ||
|
||
- Collection of sequenced Nanopore reads of all samples to be analysed in a `fastqsanger` or `fastqsanger.gz` format. | ||
- **Pathogen Detection: PathoGFAIR Samples Aggregation and Visualisation** | ||
|
||
### Output Datasets | ||
## Getting Started | ||
|
||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastqsanger` or `fastqsanger.gz` format. | ||
To learn more about these workflows and to try them with real datasets, please visit our Microbiome tutorials on the Galaxy Training Network (GTN): | ||
|
||
- Tables indicating total number of reads before and after host sequences trimming, and the host sequences percentages found in each sample. | ||
[Microbiome Tutorials on GTN](https://training.galaxyproject.org/training-material/topics/microbiome/) | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
|
||
## _Taxonomy Profiling_ and Visualisation with Krona | ||
## Dedicated Training Material | ||
|
||
In this workflow, we identify the different organisms found in our samples by assigning taxonomy levels to the reads starting from the kingdom level down to the species level and visualise the result. | ||
The workflows for **Nanopore Preprocessing**, **Taxonomy Profiling and Visualization with Krona**, **Gene-based Pathogen Identification**, **Allele-based Pathogen Identification**, and **Pathogen Detection: PathoGFAIR Samples Aggregation and Visualization** can all be tried out in a dedicated training material on GTN for foodborne pathogen detection and tracking: | ||
|
||
It’s important to check what might be the species of a possible pathogen to be found, it gets us closer to the investigation as well as discovering possible multiple pathogenetic infections if any existed. | ||
[GTN Tutorial for Foodborne Pathogen Detection and Tracking](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
|
||
For taxonomy profiling Kraken2 tool is used along with one of its standard databases available on Galaxy, you can freely choose between Kraken2 different databases based on your input datasets. For visualisation multiple tools can be used, Krona pie chart (as default in this workflow), Phinch interactive tool, Pavian, etc. | ||
|
||
### Input Datasets | ||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastqsanger` or `fastqsanger.gz` format, the output of **Nanopore Preprocessing** workflow. | ||
|
||
### Output Datasets | ||
- Taxonomy profiling Tabular file, visualisation figures and interactive pie charts. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
|
||
## Gene-based Pathogen Identification | ||
|
||
In this workflow, we determine whether the samples are pathogenic or not, by looking for genes known to be linked to pathogenicity or to the pathogenecity character of the organism. | ||
|
||
- Virulence Factor (VF): gene products, usually proteins, involved in pathogenicity. By identifying them, we can call a pathogen and its severity level | ||
|
||
- Antimicrobial Resistance genes (AMR). | ||
|
||
These type of genes have three fundamental mechanisms of antimicrobial resistance that are enzymatic degradation of antibacterial drugs, alteration of bacterial proteins that are antimicrobial targets, and changes in membrane permeability to antibiotics, which will lead to not altering the target site and spread throughput the pathogenic bacteria decreasing the overall fitness of the host. | ||
|
||
In this workflow we: | ||
|
||
1. Perform genome assembly to get contigs, i.e. longer sequences, using metaflye (Kolmogorov et al. 2020) then assembly polishing using medaka consensus pipeline and visualizing the assembly graph using Bandage Image (Wick et al. 2015) | ||
2. Generate reports with AMR genes and VF using ABRicate | ||
|
||
### Input Datasets | ||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastqsanger` or `fastqsanger.gz` format, the output of **Nanopore Preprocessing** workflow. | ||
|
||
### Output Datasets | ||
- FASTA and Tabular files to track genes and visualise our pathogenic identification through out all samples. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
|
||
## Nanopore Allele-based Pathogen Identification | ||
|
||
This workflow identifies pathogens using an allelic approach, detecting Single Nucleotide Polymorphisms (SNPs) to track emerging variants, i.e. markers showing evolutionary histories of homogeneous strains. This process includes SNP calling, aimed at identifying novel pathogen strains and elucidating discrepancies compared to reference sequences, thereby facilitating the tracking of emerging variants. Within this workflow, both variants and SNPs are discerned, serving as crucial elements for subsequent pathogen identification and variant tracking purposes. | ||
|
||
To perform the mapping step before variant identification, we used the Minimap2 tool, specifically designed for Nanopore reads. If you're working with Illumina data, you can substitute Minimap2 with Bowtie2. | ||
|
||
### Input Datasets | ||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastq or fastq.gz` format, the output of **Nanopore Preprocessing** workflow. | ||
- A reference genome to the tested pathogen. | ||
|
||
### Output Datasets | ||
- VCF files indicating identified variants and SNPs, BAM files with mapping results, and Tabular files with mapping depth and coverage calculations. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
|
||
## Pathogen Detection: PathoGFAIR Samples Aggregation and Visualisation | ||
|
||
In this workflow, we will aggregate results and use the results from 3 workflows (**Nanopore Preprocessing**, **Gene-based Pathogen Identification** and **Nanopore Allele-based Pathogen Identification**) to help track pathogens among samples and visualise all performed analysis by: | ||
|
||
1. Drawing a presence-absence heatmap of the identified VF genes within all samples to visualise in which samples these genes can be found. | ||
2. Drawing a phylogenetic tree for each pathogenic genes detected, where we will relate the contigs of the samples together where this gene is found. | ||
3. Plotting QC reads, host reads, mapping coverage and depth, and SNP analysis. | ||
|
||
With these types of visualisations, we can have an overview of all samples and the genes, but also how samples are related to each other, which common pathogenic genes they share. Given the time of the sampling and the location one can easily identify using these graphs, where and when the contamination has occurred among the different samples. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
2 changes: 1 addition & 1 deletion
2
...e-based-Pathogen-Identification-tests.yml → ...e-based-Pathogen-Identification-tests.yml
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
8 changes: 3 additions & 5 deletions
8
workflows/microbiome/nanopore-allele-based-pathogen-identification/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,14 +1,12 @@ | ||
# Nanopore _Allele-based Pathogen Identification_ | ||
# Allele-based Pathogen Identification | ||
|
||
This workflow identifies pathogens using an allelic approach, detecting Single Nucleotide Polymorphisms (SNPs) to track emerging variants, i.e. markers showing evolutionary histories of homogeneous strains. This process includes SNP calling, aimed at identifying novel pathogen strains and elucidating discrepancies compared to reference sequences, thereby facilitating the tracking of emerging variants. Within this workflow, both variants and SNPs are discerned, serving as crucial elements for subsequent pathogen identification and variant tracking purposes. | ||
|
||
To perform the mapping step before variant identification, we used the Minimap2 tool, specifically designed for Nanopore reads. If you're working with Illumina data, you can substitute Minimap2 with Bowtie2. | ||
|
||
## Input Datasets | ||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastq or fastq.gz` format, the output of **Nanopore _Preprocessing_** workflow. | ||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastqsanger or fastqsanger.gz` format, the output of **Nanopore Preprocessing** workflow. | ||
- A reference genome to the tested pathogen. | ||
|
||
## Output Datasets | ||
- VCF files indicating identified variants and SNPs, BAM files with mapping results, and Tabular files with mapping depth and coverage calculations. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
This workflow is available to try via our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,17 +1,17 @@ | ||
# Nanopore _Preprocessing_ | ||
# Nanopore Preprocessing | ||
|
||
Before starting any analysis, it is always a good idea to assess the quality of your input data and to discard poor-quality base content by trimming and filtering reads. | ||
|
||
Generally, we are not interested in the host sequences, but rather only those originating from the pathogen itself. It is important to get rid of all host sequences and to only retain sequences that might include a pathogen, both in order to speed up further steps and to avoid host sequences compromising the analysis. | ||
|
||
## Input Datasets | ||
|
||
- Collection of sequenced Nanopore reads of all samples to be analysed in a `fastq or fastq.gz` format. | ||
- Collection of sequenced Nanopore reads of all samples to be analysed in a `fastqsanger` or `fastqsanger.gz` format. | ||
|
||
## Output Datasets | ||
|
||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastq or fastq.gz` format. | ||
- Collection of Pre-Processed Sequenced reads of all samples, ready for further analysis with the other workflows, in a `fastqsanger` or `fastqsanger.gz` format. | ||
|
||
- Tables indicating total number of reads before and after host sequences trimming, and the host sequences percentages found in each sample. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
This workflow is available to try via our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
6 changes: 3 additions & 3 deletions
6
...e/pathogen-detection-pathogfair-samples-aggregation-and-visualisation/README.md
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,11 @@ | ||
# Pathogen Detection: _PathoGFAIR Samples Aggregation and Visualisation_ | ||
# Pathogen Detection: PathoGFAIR Samples Aggregation and Visualisation | ||
|
||
In this workflow, we will aggregate results and use the results from 3 workflows (**Nanopore _Preprocessing_**, **_Gene-based Pathogen Identification_** and **Nanopore _Allele-based Pathogen Identification_**) to help track pathogens among samples and visualise all performed analysis by: | ||
In this workflow, we will aggregate results and use the results from 3 workflows (**Nanopore Preprocessing**, **Gene-based Pathogen Identification** and **Nanopore Allele-based Pathogen Identification**) to help track pathogens among samples and visualise all performed analysis by: | ||
|
||
1. Drawing a presence-absence heatmap of the identified VF genes within all samples to visualise in which samples these genes can be found. | ||
2. Drawing a phylogenetic tree for each pathogenic genes detected, where we will relate the contigs of the samples together where this gene is found. | ||
3. Plotting QC reads, host reads, mapping coverage and depth, and SNP analysis. | ||
|
||
With these types of visualisations, we can have an overview of all samples and the genes, but also how samples are related to each other, which common pathogenic genes they share. Given the time of the sampling and the location one can easily identify using these graphs, where and when the contamination has occurred among the different samples. | ||
|
||
This workflow is available for trial through our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) | ||
This workflow is available to try via our [GTN tutorial](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.