Skip to content

Commit

Permalink
Add microbiome analysis workflows (#182)
Browse files Browse the repository at this point in the history
* adding microbiome analysis workflows to IWC with test data

* adding Changelog, REadme and dockstore yml file

* solving linting issues

* solving linting error by correcting the file name since i forgot the underscore in its name before

* applying all comments

* adding workflows for the collection version

* applying wolfgang comments, still removing the decompress tool is missing, to be added once the PR of the tool update is merges

* updating the preprocessing workflow replacing the decompressing step with an update to Krakentool, and also pushing the latest updates to the workflows, still the planemo tests fails for the same reasons, I need help with that

* renaming all inputs and outputs to include no spaces and used underscore instead and low caps to solve the linting issue, and added more description for the all samples readme file

* changing the tested files to be the last file in the collection, to solve the testing issues, since the github tests always compare the tested file with the last file in the collection regardless of the file name

* removing the word collection from the folder names, since all workflows in this PR work with collection no need to specify that in the folder name

* testing another file for SNP workflow to check the testing error

* updating the SNP workflow to solve the testing issue

* Second attempt to solve SNP test error

* attempt 1 for solving the taxonomy profiling test failure

* solving taxonomy profiling testing error

* updating the general readme file to include the training material of the workflows

* using MinusB standard Kraken2 database for taxonomy profiling, trying to solve the issue of database size, explained in the readme that database can be changed based on the input datasets within Galaxy

* correcting a file name

* correcing a typo

* reducing test datasets samples size, by keeping only reads that are used to detect virulence factor later on with the workflows, and that to make tests faster

* updating the preprocessing workflow to remove hosts sequences in a more relaiable way, and trying to solve the gene based workflow testing error on github

* updating workflows attempt 1 to solve current errors

* removing filter failed datasets tool, attempt to solve the testing failures of genebased pathogen detection and preprocessing workflows

* removing attribute from the genebased test

* trial to solve gene based workflow github test failure

* attempt 2 to solve the tests failer

* updating some workflows

* correcting a preprocessing workflow test output file

* trying to solve genebased test failure

* another attempt to solve the testing error which states in the log as if is something wrong with line1 in the workflow ga file

* another trial

* solving the taxonomy profiling testing failure due to the standardPF database size, where we use now a input parameter instead, thank you so much  Avatar

* updating workflows based on our latest paper update April 2024

* updating workflows to include the tools latest version in Galaxy eu

* correcting a test file

* updating all readme files for workflows

* adding workflow comments, arranging the main folder and workflows names, to make it opened for any microbiome workflows

* editing readme and workflows namings and tags based on Berenice's comments

* solving linting problem

* updating few typos

* removing a test file which is no longer produced by the updated workflow, solving the linting issue

* i missed to change the test file correctly the last push, here we go this time :D

* updated workflow reports, and adding the total number of reads before and after host removal to the Multiqc output of the preprocessing workflow

* solving the error of MultiQC including the host reads removal details, now the table is included and multiq runs correctly :)

* adding a step in the 5th workflow to remove all failed datasets from the collections comming from genebased pathogen identification, which may happen when no contigs are found by metaFlye for some samples

* add more filteration steps of removing failed and empty datasets in collections, to protect the workflow from the carry on error that might occur for samples with fewer number of contigs, VF genes and AMR genes found

* leaving the optional input to minimap2 selectng the samples profile to the user to choose

* correcting a comment typo in the preprocessing workflow

* adding a 5in1 workflow named PathoGFAIR that groups all other 5 workflows of PathoGFAIR in one workflow

* removing the 5 in 1 workflow and adding tags to main workflow outputs to help users track their histories

* correcting typos in readmes

* removing hashtags from tags to make them normal ones not promoted ones

* Apply suggestions from code review

Co-authored-by: Marius van den Beek <m.vandenbeek@gmail.com>

* applying all comments from Marius, changing all Readme's accordingly, and changing the Allele based workflow to be general not specific to Nanopore

* updating readmes to explain more what to trying workflow mean

* adding Zenodo links to all fastq or fasta files

* updating Zenodo with the correct test file names

* updating a test file name in the gene based test

* correcting another test data names

* updating resulted test files with the updated file names

* correcting from path to location for the zenodo tags and adding the file types

* nothing changed I am just trying again

* changing file type for fastq files from fastq to fastqsanger

* using the same fastq test file in Allele based, gene based and taxonomy profiling

* updating Preprocessing workflow, to take the reference genome of the host as a user input, and updated the test accordingly

* removing bar chart tool, to add the reports  visualisations later

* replacing the testing reference genome since ce6 is giving an error

* Thanks to Björn he solved the module issue, I am trying now, hope it works this time :)

* 2 docker containers have been fixed, thanks to Björn

---------

Co-authored-by: Marius van den Beek <m.vandenbeek@gmail.com>
  • Loading branch information
EngyNasr and mvdbeek committed Jun 25, 2024
1 parent 389d7c9 commit b8dac7d
Show file tree
Hide file tree
Showing 45 changed files with 8,585 additions and 0 deletions.
28 changes: 28 additions & 0 deletions workflows/microbiome/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Microbiome Workflows

In this directory, you will find a collection of workflows designed for microbiome data analysis, pathogen detection, and tracking. These workflows are ready to use and can be adapted for various sequencing techniques using Galaxy's customizable and automatable API.

## Avaiable Workflows

- **Nanopore Preprocessing**

- **Taxonomy Profiling and Visualisation with Krona**

- **Gene-based Pathogen Identification**

- **Allele-based Pathogen Identification**

- **Pathogen Detection: PathoGFAIR Samples Aggregation and Visualisation**

## Getting Started

To learn more about these workflows and to try them with real datasets, please visit our Microbiome tutorials on the Galaxy Training Network (GTN):

[Microbiome Tutorials on GTN](https://training.galaxyproject.org/training-material/topics/microbiome/)


## Dedicated Training Material

The workflows for **Nanopore Preprocessing**, **Taxonomy Profiling and Visualization with Krona**, **Gene-based Pathogen Identification**, **Allele-based Pathogen Identification**, and **Pathogen Detection: PathoGFAIR Samples Aggregation and Visualization** can all be tried out in a dedicated training material on GTN for foodborne pathogen detection and tracking:

[GTN Tutorial for Foodborne Pathogen Detection and Tracking](https://training.galaxyproject.org/training-material/topics/microbiome/tutorials/pathogen-detection-from-nanopore-foodborne-data/tutorial.html)
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
version: 1.2
workflows:
- name: main
subclass: Galaxy
publish: true
primaryDescriptorPath: /Allele-based-Pathogen-Identification.ga
testParameterFiles:
- /Allele-based-Pathogen-Identification-tests.yml
authors:
- name: Engy Nasr
orcid: 0000-0001-9047-4215
url: https://orcid.org/0000-0001-9047-4215
- name: "Bérénice Batut"
orcid: 0000-0001-9852-1987
url: https://orcid.org/0000-0001-9852-1987
- name: Paul Zierep
orcid: 0000-0003-2982-388X
url: https://orcid.org/0000-0003-2982-388X
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
- doc: Test outline for Allele-based-Pathogen-Identification
job:
reference_genome_of_tested_strain:
class: File
location: https://zenodo.org/record/12190648/files/reference_genome_of_tested_strain.fasta.gz
filetype: fasta.gz
collection_of_preprocessed_samples:
class: Collection
collection_type: list
elements:
- class: File
identifier: nanopore_preprocessed_collection_of_all_samples_Spike3bBarcode10
location: https://zenodo.org/record/12190648/files/nanopore_preprocessed_collection_of_all_samples_Spike3bBarcode10.fastq.gz
filetype: fastqsanger.gz
- class: File
identifier: nanopore_preprocessed_collection_of_all_samples_Spike3bBarcode12
location: https://zenodo.org/record/12190648/files/nanopore_preprocessed_collection_of_all_samples_Spike3bBarcode12.fastq.gz
filetype: fastqsanger.gz
samples_profile: null
outputs:
mapping_mean_depth_per_sample:
path: test-data/mapping_mean_depth_per_sample.tabular
Loading

0 comments on commit b8dac7d

Please sign in to comment.