diff --git a/docs/source/assets/executions-tui.png b/docs/source/assets/executions-tui.png new file mode 100644 index 00000000..d803cbec Binary files /dev/null and b/docs/source/assets/executions-tui.png differ diff --git a/docs/source/assets/latch-exec.png b/docs/source/assets/latch-exec.png new file mode 100644 index 00000000..9530e099 Binary files /dev/null and b/docs/source/assets/latch-exec.png differ diff --git a/docs/source/basics/draft.md b/docs/source/basics/draft.md new file mode 100644 index 00000000..0b088cf1 --- /dev/null +++ b/docs/source/basics/draft.md @@ -0,0 +1,311 @@ +# Parking Lot + +Formally, a workflow can be described as a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), where each +node in the graph is called a task. This computational graph is a flexible model +to describe most any bioinformatics analysis. + +In this example, a workflow ingests sequencing files in FastQ format and +produces a sorted assembly file. The workflow's DAG has two tasks. The first +task turns the FastQ files into a single BAM file using an assembly algorithm. +The second task sorts the assembly from the first task. The final output is a +useful assembly conducive to downstream analysis and visualization in tools like +[IGV](https://software.broadinstitute.org/software/igv/). + +The Latch SDK lets you define your workflow tasks as python functions. +The parameters in the function signature define the task inputs and return +values define the task outputs. The body of the function holds the task logic, +which can be written in plain python or can be subprocessed through a +program/library in any language. + +```python +@small_task +def assembly_task(read1: LatchFile, read2: LatchFile) -> LatchFile: + + # A reference to our output. + sam_file = Path("covid_assembly.sam").resolve() + + _bowtie2_cmd = [ + "bowtie2/bowtie2", + "--local", + "-x", + "wuhan", + "-1", + read1.local_path, + "-2", + read2.local_path, + "--very-sensitive-local", + "-S", + str(sam_file), + ] + + subprocess.run(_bowtie2_cmd) + + return LatchFile(str(sam_file), "latch:///covid_assembly.sam") +``` + +These tasks are then "glued" together in another function that represents the +workflow. The workflow function body simply chains the task functions by calling +them and passing returned values to downstream task functions. Notice that our +workflow function calls the task that we just defined, `assembly_task`, as well +as another task we can assume was defined elsewhere, `sort_bam_task`. + +You must not write actual logic in the workflow function body. It can only be +used to call task functions and pass task function return values to downstream +task functions. Additionally all task functions must be called with keyword +arguments. You also cannot access variables directly in the workflow function; +in the example below, you would not be able to pass in `read1=read1.local_path`. + +```python +@workflow +def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile: + + sam = assembly_task(read1=read1, read2=read2) + return sort_bam_task(sam=sam) +``` + +Workflow function docstrings also contain markdown formatted +documentation and a DSL to specify the presentation of parameters when the +workflow interface is generated. We'll add this content to the docstring of the +workflow function we just wrote. + +```python +@workflow +def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile: + """Description... + + markdown header + ---- + + Write some documentation about your workflow in + markdown here: + + > Regular markdown constructs work as expected. + + # Heading + + * content1 + * content2 + + __metadata__: + display_name: Assemble and Sort FastQ Files + author: + name: + email: + github: + repository: + license: + id: MIT + + Args: + + read1: + Paired-end read 1 file to be assembled. + + __metadata__: + display_name: Read1 + + read2: + Paired-end read 2 file to be assembled. + + __metadata__: + display_name: Read2 + """ + + sam = assembly_task(read1=read1, read2=read2) + return sort_bam_task(sam=sam) +``` + +## Workflow Code Structure + +So far we have defined workflows and tasks as python functions but we don't know +where to put them or what supplementary files might be needed to run the code on +the Latch platform. + +Workflow code needs to live in directory with three necessary +elements: + +* a file named `Dockerfile` that defines the computing environment of your tasks +* a file named `version` that holds the plaintext version of the workflow +* a directory named `wf` that holds the python code needed for the workflow. +* task and workflow functions must live in a `wf/__init__.py` file + +These three elements must be named as specified above. The directory should have +the following structure: + +```text +├── Dockerfile +├── version +└── wf + └── __init__.py +``` + +The SDK ships with easily retrievable example workflow code. Just type +`latch init myworkflow` to construct a directory structured as above for +reference or boilerplate. + +### Example `Dockerfile` + +**Note**: you are required to use our base image for the time being. + +```Dockerfile +FROM 812206152185.dkr.ecr.us-west-2.amazonaws.com/latch-base:9a7d-main + +# Its easy to build binaries from source that you can later reference as +# subprocesses within your workflow. +RUN curl -L https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.4.4/bowtie2-2.4.4-linux-x86_64.zip/download -o bowtie2-2.4.4.zip &&\ + unzip bowtie2-2.4.4.zip &&\ + mv bowtie2-2.4.4-linux-x86_64 bowtie2 + +# Or use managed library distributions through the container OS's package +# manager. +RUN apt-get update -y &&\ + apt-get install -y autoconf samtools + + +# You can use local data to construct your workflow image. Here we copy a +# pre-indexed reference to a path that our workflow can reference. +COPY data /root/reference +ENV BOWTIE2_INDEXES="reference" + +COPY wf /root/wf + +# STOP HERE: +# The following lines are needed to ensure your build environement works +# correctly with latch. +ARG tag +ENV FLYTE_INTERNAL_IMAGE $tag +RUN sed -i 's/latch/wf/g' flytekit.config +RUN python3 -m pip install --upgrade latch +WORKDIR /root +``` + +### Example `version` File + +You can use any versioning scheme that you would like, as long as each register +has a unique version value. We recommend sticking with [semantic +versioning](https://semver.org/). + +```text +v0.0.0 +``` + +### Example `wf/__init__.py` File + +```python +import subprocess +from pathlib import Path + +from latch import small_task, workflow +from latch.types import LatchFile + + +@small_task +def assembly_task(read1: LatchFile, read2: LatchFile) -> LatchFile: + + # A reference to our output. + sam_file = Path("covid_assembly.sam").resolve() + + _bowtie2_cmd = [ + "bowtie2/bowtie2", + "--local", + "-x", + "wuhan", + "-1", + read1.local_path, + "-2", + read2.local_path, + "--very-sensitive-local", + "-S", + str(sam_file), + ] + + subprocess.run(_bowtie2_cmd) + + return LatchFile(str(sam_file), "latch:///covid_assembly.sam") + + +@small_task +def sort_bam_task(sam: LatchFile) -> LatchFile: + + bam_file = Path("covid_sorted.bam").resolve() + + _samtools_sort_cmd = [ + "samtools", + "sort", + "-o", + str(bam_file), + "-O", + "bam", + sam.local_path, + ] + + subprocess.run(_samtools_sort_cmd) + + return LatchFile(str(bam_file), "latch:///covid_sorted.bam") + + +@workflow +def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile: + """Description... + + markdown header + ---- + + Write some documentation about your workflow in + markdown here: + + > Regular markdown constructs work as expected. + + # Heading + + * content1 + * content2 + + __metadata__: + display_name: Assemble and Sort FastQ Files + author: + name: + email: + github: + repository: + license: + id: MIT + + Args: + + read1: + Paired-end read 1 file to be assembled. + + __metadata__: + display_name: Read1 + + read2: + Paired-end read 2 file to be assembled. + + __metadata__: + display_name: Read2 + """ + sam = assembly_task(read1=read1, read2=read2) + return sort_bam_task(sam=sam) +``` + +## What happens at registration? + +Now that we've defined our functions, we are ready to register our workflow with +the [LatchBio](https://latch.bio) platform. This will give us: + +* a no-code interface +* managed cloud infrastructure for workflow execution +* a dedicated API endpoint for programmatic execution +* hosted documentation +* parallelized CSV-to-batch execution + +To register, we type `latch register ` into our terminal (where +directory_name is the name of the directory holding our code, Dockerfile and +version file). + +The registration process requires a local installation of Docker. + +To re-register changes, make sure you update the value in the version file. (The +value of the version is not important, only that it is distinct from previously +registered versions). \ No newline at end of file diff --git a/docs/source/basics/local_development.md b/docs/source/basics/local_development.md index 122b4836..7933d999 100644 --- a/docs/source/basics/local_development.md +++ b/docs/source/basics/local_development.md @@ -1,31 +1,13 @@ # Local Development -Executing workflows on the LatchBio platform is heavily encouraged for -consistent behavior. +Executing workflows on the LatchBio platform is heavily encouraged for consistent behavior. -Workflows often deal with enormous files that are too large for local -development environments and sometimes require computing resources that cannot -be accommodated by local machines or are just unavailable (eg. GPUs). Thus, -there are many cases when local executions with smaller files or reduced -resources may behave differently than on properly configured cloud -infrastructure. Local execution should never be a substitute for testing -workflow logic on the platform itself. +Workflows often deal with enormous files that are too large for local development environments and sometimes require computing resources that cannot be accommodated by local machines or are just unavailable (eg. GPUs). Thus, there are many cases when local executions with smaller files or reduced resources may behave differently than on properly configured cloud infrastructure. Local execution should never be a substitute for testing workflow logic on the platform itself. However, the ability to quickly iterate and debug task logic locally is certainly useful for teasing out many types of bugs. -Using a `if __name__ == "__main__":` clause is a useful way to tag local -function calls with sample values. Running `python3 wf/__init__.py` will become -an entrypoint for quick debugging. - -To run the same entrypoint _within_ the latest registered container, one can run -`latch local-execute `. This gives the same confidence in -reproducible behavior one would usually receive post registration but with the -benefits of fast local development. Note that workflow code is -mounted inside the latest container build so that rebuilds are not consistently -made with rapid changes. - -More information [here](https://docs.latch.bio/subcommands.html#latch-local-execute). +Using a `if __name__ == "__main__":` clause is a useful way to tag local function calls with sample values. Running `python3 wf/__init__.py` will become an entrypoint for quick debugging. Here is an example of a minimal `wf/__init__.py` file that demonstrates local execution: diff --git a/docs/source/basics/remote_execution.md b/docs/source/basics/remote_execution.md index c15565f9..03a60cf7 100644 --- a/docs/source/basics/remote_execution.md +++ b/docs/source/basics/remote_execution.md @@ -4,326 +4,28 @@ It is frequently desirable to be able to access a shell from within a running task of a workflow, to debug a misbehaving program or inspect some files for example. -_This feature is in alpha, please contact hannah@latch.bio to gain access._ - When inspecting a running task in the Console (console.latch.bio), simply click on the node representing the desired task and copy and paste the `latch exec ` subcommand in the right sidebar into your terminal to retrieve a live shell from within the running task. ---- - -Formally, a workflow can be described as a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), where each -node in the graph is called a task. This computational graph is a flexible model -to describe most any bioinformatics analysis. - -In this example, a workflow ingests sequencing files in FastQ format and -produces a sorted assembly file. The workflow's DAG has two tasks. The first -task turns the FastQ files into a single BAM file using an assembly algorithm. -The second task sorts the assembly from the first task. The final output is a -useful assembly conducive to downstream analysis and visualization in tools like -[IGV](https://software.broadinstitute.org/software/igv/). - -The Latch SDK lets you define your workflow tasks as python functions. -The parameters in the function signature define the task inputs and return -values define the task outputs. The body of the function holds the task logic, -which can be written in plain python or can be subprocessed through a -program/library in any language. +![latch exec](../assets/latch-exec.png) +The shell session is accessible as long as the task is executing. For short-lived tasks, you can use the **Start**, **Stop** options on the sidebar to pause a task. Alternatively, you can also programmatically sleep a task like so: ```python -@small_task -def assembly_task(read1: LatchFile, read2: LatchFile) -> LatchFile: +import time - # A reference to our output. - sam_file = Path("covid_assembly.sam").resolve() - - _bowtie2_cmd = [ - "bowtie2/bowtie2", - "--local", - "-x", - "wuhan", - "-1", - read1.local_path, - "-2", - read2.local_path, - "--very-sensitive-local", - "-S", - str(sam_file), - ] - - subprocess.run(_bowtie2_cmd) - - return LatchFile(str(sam_file), "latch:///covid_assembly.sam") +@task +def very_short_task(a: int, b: int) -> int: + time.sleep(300) # Sleep the task for 5 minutes + return a + b ``` -These tasks are then "glued" together in another function that represents the -workflow. The workflow function body simply chains the task functions by calling -them and passing returned values to downstream task functions. Notice that our -workflow function calls the task that we just defined, `assembly_task`, as well -as another task we can assume was defined elsewhere, `sort_bam_task`. - -You must not write actual logic in the workflow function body. It can only be -used to call task functions and pass task function return values to downstream -task functions. Additionally all task functions must be called with keyword -arguments. You also cannot access variables directly in the workflow function; -in the example below, you would not be able to pass in `read1=read1.local_path`. - -```python -@workflow -def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile: - - sam = assembly_task(read1=read1, read2=read2) - return sort_bam_task(sam=sam) -``` - -Workflow function docstrings also contain markdown formatted -documentation and a DSL to specify the presentation of parameters when the -workflow interface is generated. We'll add this content to the docstring of the -workflow function we just wrote. - -```python -@workflow -def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile: - """Description... - - markdown header - ---- - - Write some documentation about your workflow in - markdown here: - - > Regular markdown constructs work as expected. - - # Heading - - * content1 - * content2 - - __metadata__: - display_name: Assemble and Sort FastQ Files - author: - name: - email: - github: - repository: - license: - id: MIT - - Args: - - read1: - Paired-end read 1 file to be assembled. - - __metadata__: - display_name: Read1 - - read2: - Paired-end read 2 file to be assembled. - - __metadata__: - display_name: Read2 - """ - - sam = assembly_task(read1=read1, read2=read2) - return sort_bam_task(sam=sam) -``` - -## Workflow Code Structure - -So far we have defined workflows and tasks as python functions but we don't know -where to put them or what supplementary files might be needed to run the code on -the Latch platform. - -Workflow code needs to live in directory with three necessary -elements: - -* a file named `Dockerfile` that defines the computing environment of your tasks -* a file named `version` that holds the plaintext version of the workflow -* a directory named `wf` that holds the python code needed for the workflow. -* task and workflow functions must live in a `wf/__init__.py` file - -These three elements must be named as specified above. The directory should have -the following structure: - -```text -├── Dockerfile -├── version -└── wf - └── __init__.py -``` - -The SDK ships with easily retrievable example workflow code. Just type -`latch init myworkflow` to construct a directory structured as above for -reference or boilerplate. - -### Example `Dockerfile` - -**Note**: you are required to use our base image for the time being. - -```Dockerfile -FROM 812206152185.dkr.ecr.us-west-2.amazonaws.com/latch-base:9a7d-main - -# Its easy to build binaries from source that you can later reference as -# subprocesses within your workflow. -RUN curl -L https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.4.4/bowtie2-2.4.4-linux-x86_64.zip/download -o bowtie2-2.4.4.zip &&\ - unzip bowtie2-2.4.4.zip &&\ - mv bowtie2-2.4.4-linux-x86_64 bowtie2 - -# Or use managed library distributions through the container OS's package -# manager. -RUN apt-get update -y &&\ - apt-get install -y autoconf samtools - - -# You can use local data to construct your workflow image. Here we copy a -# pre-indexed reference to a path that our workflow can reference. -COPY data /root/reference -ENV BOWTIE2_INDEXES="reference" - -COPY wf /root/wf - -# STOP HERE: -# The following lines are needed to ensure your build environement works -# correctly with latch. -ARG tag -ENV FLYTE_INTERNAL_IMAGE $tag -RUN sed -i 's/latch/wf/g' flytekit.config -RUN python3 -m pip install --upgrade latch -WORKDIR /root -``` - -### Example `version` File - -You can use any versioning scheme that you would like, as long as each register -has a unique version value. We recommend sticking with [semantic -versioning](https://semver.org/). - -```text -v0.0.0 -``` - -### Example `wf/__init__.py` File - -```python -import subprocess -from pathlib import Path - -from latch import small_task, workflow -from latch.types import LatchFile - - -@small_task -def assembly_task(read1: LatchFile, read2: LatchFile) -> LatchFile: - - # A reference to our output. - sam_file = Path("covid_assembly.sam").resolve() - - _bowtie2_cmd = [ - "bowtie2/bowtie2", - "--local", - "-x", - "wuhan", - "-1", - read1.local_path, - "-2", - read2.local_path, - "--very-sensitive-local", - "-S", - str(sam_file), - ] - - subprocess.run(_bowtie2_cmd) - - return LatchFile(str(sam_file), "latch:///covid_assembly.sam") - - -@small_task -def sort_bam_task(sam: LatchFile) -> LatchFile: - - bam_file = Path("covid_sorted.bam").resolve() - - _samtools_sort_cmd = [ - "samtools", - "sort", - "-o", - str(bam_file), - "-O", - "bam", - sam.local_path, - ] - - subprocess.run(_samtools_sort_cmd) - - return LatchFile(str(bam_file), "latch:///covid_sorted.bam") - - -@workflow -def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile: - """Description... - - markdown header - ---- - - Write some documentation about your workflow in - markdown here: - - > Regular markdown constructs work as expected. - - # Heading - - * content1 - * content2 - - __metadata__: - display_name: Assemble and Sort FastQ Files - author: - name: - email: - github: - repository: - license: - id: MIT - - Args: - - read1: - Paired-end read 1 file to be assembled. - - __metadata__: - display_name: Read1 - - read2: - Paired-end read 2 file to be assembled. - - __metadata__: - display_name: Read2 - """ - sam = assembly_task(read1=read1, read2=read2) - return sort_bam_task(sam=sam) -``` - -## What happens at registration? - -Now that we've defined our functions, we are ready to register our workflow with -the [LatchBio](https://latch.bio) platform. This will give us: - -* a no-code interface -* managed cloud infrastructure for workflow execution -* a dedicated API endpoint for programmatic execution -* hosted documentation -* parallelized CSV-to-batch execution - -To register, we type `latch register ` into our terminal (where -directory_name is the name of the directory holding our code, Dockerfile and -version file). - -The registration process requires a local installation of Docker. +_This feature is in alpha, please contact hannah@latch.bio to gain access._ -To re-register changes, make sure you update the value in the version file. (The -value of the version is not important, only that it is distinct from previously -registered versions). +--- -### Remote Registration [Alpha] +## Remote Registration [Alpha] If you do not have access to Docker on your local machine, lack space on your local filesystem for image layers, or lack fast internet to facilitate timely diff --git a/docs/source/examples/workflows_examples.md b/docs/source/examples/workflows_examples.md index 8afc2388..e5ed9c4e 100644 --- a/docs/source/examples/workflows_examples.md +++ b/docs/source/examples/workflows_examples.md @@ -24,4 +24,7 @@ We'll maintain a growing list of well documented examples developed by our commu **Protein Engineering** * [UniRep: mLSTM "babbler" deep representation learner for protein engineering](https://github.com/matteobolner/unirep_latch) - * [FAMSA: Multiple sequence protein alignment](https://github.com/shivaramakrishna99/famsa-latch) \ No newline at end of file + * [FAMSA: Multiple sequence protein alignment](https://github.com/shivaramakrishna99/famsa-latch) + + **Nextflow** + * [A Nextflow workflow to process FastQs](https://github.com/latchbio/wf-rejuvenome-nf_redun_06) \ No newline at end of file diff --git a/docs/source/getting_started/authoring_your_workflow.md b/docs/source/getting_started/authoring_your_workflow.md index e835f99a..b3eaa073 100644 --- a/docs/source/getting_started/authoring_your_workflow.md +++ b/docs/source/getting_started/authoring_your_workflow.md @@ -304,7 +304,19 @@ To test your first workflow on Console, select the **Test Data** and click Launc ![Interface UI](../assets/interface.png) ### Using Latch CLI -To launch the workflow on Latch Console from the CLI, first generate a parameters file: +Using `latch get-wf`, you can view the names of all workflows available in your workspace: +```shell-session +$ latch get-wf + +ID Name Version +65047 wf.deseqf.deseq2_wf 0.0.1r +67056 wf.__init__.aggregate_bulk_rna_seq_counts 0.0.2-eb5e84 +67649 wf.__init__.align_with_salmon 0.0.0-4cd8db +67628 wf.__init__.alphafold_wf v2.2.3+46 +67617 wf.__init__.assemble_and_sort 0.0.1 +``` + +To launch the workflow on Latch Console from the CLI, first generate a parameters file by using `latch get-params` and passing in the name of your workflow like so: ```shell-session $ latch get-params wf.__init__.assemble_and_sort ``` @@ -332,6 +344,10 @@ You can view execution statuses from the CLI, run: $ latch get-executions ``` +![Executions TUI](../assets/executions-tui.png) + +The command will open up a Terminal UI with the same capabilities on the Executions page on the Latch Platform, where you will see a list of executions, tasks, and logs for easy debugging. + --- # Next Steps * Read the [Concepts](../basics/what_is_a_workflow.md) page diff --git a/docs/source/index.md b/docs/source/index.md index 7473657d..9141abdf 100644 --- a/docs/source/index.md +++ b/docs/source/index.md @@ -36,8 +36,6 @@ Latch SDK containerizes and versions the code in the background each time a work * **Workflows monitoring**: For batched runs of workflows, we'll aim to provide better dashboard, logs, traces, metrics, and alerting for observability. ## What the Latch SDK is not - -* **A pure workflow orchestration engine**: There are many popular workflow orchestration engines, such as Nextflow or Snakemake, that can be run locally from a bioinformatician's machine. Although workflow orchestration is a feature of Latch SDK, Latch also provides managed infrastructure and no-code interface generation. You can also easily bring existing workflow script of any language to Latch (See examples [here](./examples/workflows_examples.md)). * **A self-hosted solution**: Currently, you cannot write your workflow using Latch SDK and host it in your own AWS instance or an HPC. The infrastructure serving bioinformatics pipelines is fully managed by Latch. This allows us to rapidly iterate to bring on high quality features, give cost and performance guarantees, and ensure that security is offered out-of-the-box. ## Examples @@ -77,19 +75,32 @@ getting_started/authoring_your_workflow ```{toctree} :hidden: :maxdepth: 2 -:caption: Concepts +:caption: Defining a Workflow basics/what_is_a_workflow -basics/parameter_types +basics/writing_dockerfiles basics/working_with_files -basics/uploading_test_data -basics/customizing_interface +basics/parameter_types basics/defining_cloud_resources -basics/writing_dockerfiles -basics/local_development +basics/customizing_interface basics/caching basics/conditional_and_map_tasks ``` +```{toctree} +:hidden: +:maxdepth: 2 +:caption: Testing and Debugging a Workflow +basics/local_development +basics/remote_execution +``` + +```{toctree} +:hidden: +:maxdepth: 2 +:caption: Publishing a Workflow +basics/uploading_test_data +``` + ```{toctree} :hidden: :maxdepth: 2 diff --git a/docs/source/tutorials/metamage.md b/docs/source/tutorials/metamage.md index 0777b6d1..74146541 100644 --- a/docs/source/tutorials/metamage.md +++ b/docs/source/tutorials/metamage.md @@ -24,38 +24,36 @@ The workflow is composed of: ### Read pre-processing and host read removal -- [fastp](https://github.com/OpenGene/fastp) for read trimming - and other general pre-processing [^9] -- [BowTie2](https://github.com/BenLangmead/bowtie2) for mapping - to the host genome and extracting unaligned reads [^10] +- [fastp](https://github.com/OpenGene/fastp) for read trimming and other general pre-processing +- [BowTie2](https://github.com/BenLangmead/bowtie2) for mapping to the host genome and extracting unaligned reads ### Assembly -- [MEGAHIT](https://github.com/voutcn/megahit) for assembly [^1] +- [MEGAHIT](https://github.com/voutcn/megahit) for assembly - [MetaQuast](https://github.com/ablab/quast) for assembly evaluation ### Functional annotation - [Macrel](https://github.com/BigDataBiology/macrel) for predicting Antimicrobial Peptide - (AMP)-like sequences from contigs [^6] + (AMP)-like sequences from contigs - [fARGene](https://github.com/fannyhb/fargene) for identifying Antimicrobial Resistance Genes - (ARGs) from contigs [^7] + (ARGs) from contigs - [Gecco](https://github.com/zellerlab/GECCO) for predicting biosynthetic gene clusters - (BCGs) from contigs [^8] + (BCGs) from contigs - [Prodigal](https://github.com/hyattpd/Prodigal) for protein-coding - gene prediction from contigs. [^5] + gene prediction from contigs. ### Binning -- BowTie2 and [Samtools](https://github.com/samtools/samtools)[^11] to +- BowTie2 and [Samtools](https://github.com/samtools/samtools) to building depth files for binning. - [MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/) for - binning [^2] + binning ### Taxonomic classification of reads - [Kaiju](https://github.com/bioinformatics-centre/kaiju) for - taxonomic classification [^3] + taxonomic classification - [KronaTools](https://github.com/marbl/Krona/wiki/KronaTools) for visualizing taxonomic classification results