Skip to content

Commit

Permalink
Merge pull request #156 from latchbio/hannah/misc-docs
Browse files Browse the repository at this point in the history
Misc docs changes
  • Loading branch information
kennyworkman authored Sep 7, 2022
2 parents ddd9eee + 99dd34e commit e18a420
Show file tree
Hide file tree
Showing 9 changed files with 374 additions and 351 deletions.
Binary file added docs/source/assets/executions-tui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/source/assets/latch-exec.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
311 changes: 311 additions & 0 deletions docs/source/basics/draft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,311 @@
# Parking Lot

Formally, a workflow can be described as a [directed acyclic graph](https://en.wikipedia.org/wiki/Directed_acyclic_graph) (DAG), where each
node in the graph is called a task. This computational graph is a flexible model
to describe most any bioinformatics analysis.

In this example, a workflow ingests sequencing files in FastQ format and
produces a sorted assembly file. The workflow's DAG has two tasks. The first
task turns the FastQ files into a single BAM file using an assembly algorithm.
The second task sorts the assembly from the first task. The final output is a
useful assembly conducive to downstream analysis and visualization in tools like
[IGV](https://software.broadinstitute.org/software/igv/).

The Latch SDK lets you define your workflow tasks as python functions.
The parameters in the function signature define the task inputs and return
values define the task outputs. The body of the function holds the task logic,
which can be written in plain python or can be subprocessed through a
program/library in any language.

```python
@small_task
def assembly_task(read1: LatchFile, read2: LatchFile) -> LatchFile:

# A reference to our output.
sam_file = Path("covid_assembly.sam").resolve()

_bowtie2_cmd = [
"bowtie2/bowtie2",
"--local",
"-x",
"wuhan",
"-1",
read1.local_path,
"-2",
read2.local_path,
"--very-sensitive-local",
"-S",
str(sam_file),
]

subprocess.run(_bowtie2_cmd)

return LatchFile(str(sam_file), "latch:///covid_assembly.sam")
```

These tasks are then "glued" together in another function that represents the
workflow. The workflow function body simply chains the task functions by calling
them and passing returned values to downstream task functions. Notice that our
workflow function calls the task that we just defined, `assembly_task`, as well
as another task we can assume was defined elsewhere, `sort_bam_task`.

You must not write actual logic in the workflow function body. It can only be
used to call task functions and pass task function return values to downstream
task functions. Additionally all task functions must be called with keyword
arguments. You also cannot access variables directly in the workflow function;
in the example below, you would not be able to pass in `read1=read1.local_path`.

```python
@workflow
def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile:

sam = assembly_task(read1=read1, read2=read2)
return sort_bam_task(sam=sam)
```

Workflow function docstrings also contain markdown formatted
documentation and a DSL to specify the presentation of parameters when the
workflow interface is generated. We'll add this content to the docstring of the
workflow function we just wrote.

```python
@workflow
def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile:
"""Description...
markdown header
----
Write some documentation about your workflow in
markdown here:
> Regular markdown constructs work as expected.
# Heading
* content1
* content2
__metadata__:
display_name: Assemble and Sort FastQ Files
author:
name:
email:
github:
repository:
license:
id: MIT
Args:
read1:
Paired-end read 1 file to be assembled.
__metadata__:
display_name: Read1
read2:
Paired-end read 2 file to be assembled.
__metadata__:
display_name: Read2
"""

sam = assembly_task(read1=read1, read2=read2)
return sort_bam_task(sam=sam)
```

## Workflow Code Structure

So far we have defined workflows and tasks as python functions but we don't know
where to put them or what supplementary files might be needed to run the code on
the Latch platform.

Workflow code needs to live in directory with three necessary
elements:

* a file named `Dockerfile` that defines the computing environment of your tasks
* a file named `version` that holds the plaintext version of the workflow
* a directory named `wf` that holds the python code needed for the workflow.
* task and workflow functions must live in a `wf/__init__.py` file

These three elements must be named as specified above. The directory should have
the following structure:

```text
├── Dockerfile
├── version
└── wf
└── __init__.py
```

The SDK ships with easily retrievable example workflow code. Just type
`latch init myworkflow` to construct a directory structured as above for
reference or boilerplate.

### Example `Dockerfile`

**Note**: you are required to use our base image for the time being.

```Dockerfile
FROM 812206152185.dkr.ecr.us-west-2.amazonaws.com/latch-base:9a7d-main

# Its easy to build binaries from source that you can later reference as
# subprocesses within your workflow.
RUN curl -L https://sourceforge.net/projects/bowtie-bio/files/bowtie2/2.4.4/bowtie2-2.4.4-linux-x86_64.zip/download -o bowtie2-2.4.4.zip &&\
unzip bowtie2-2.4.4.zip &&\
mv bowtie2-2.4.4-linux-x86_64 bowtie2

# Or use managed library distributions through the container OS's package
# manager.
RUN apt-get update -y &&\
apt-get install -y autoconf samtools


# You can use local data to construct your workflow image. Here we copy a
# pre-indexed reference to a path that our workflow can reference.
COPY data /root/reference
ENV BOWTIE2_INDEXES="reference"

COPY wf /root/wf

# STOP HERE:
# The following lines are needed to ensure your build environement works
# correctly with latch.
ARG tag
ENV FLYTE_INTERNAL_IMAGE $tag
RUN sed -i 's/latch/wf/g' flytekit.config
RUN python3 -m pip install --upgrade latch
WORKDIR /root
```

### Example `version` File

You can use any versioning scheme that you would like, as long as each register
has a unique version value. We recommend sticking with [semantic
versioning](https://semver.org/).

```text
v0.0.0
```

### Example `wf/__init__.py` File

```python
import subprocess
from pathlib import Path

from latch import small_task, workflow
from latch.types import LatchFile


@small_task
def assembly_task(read1: LatchFile, read2: LatchFile) -> LatchFile:

# A reference to our output.
sam_file = Path("covid_assembly.sam").resolve()

_bowtie2_cmd = [
"bowtie2/bowtie2",
"--local",
"-x",
"wuhan",
"-1",
read1.local_path,
"-2",
read2.local_path,
"--very-sensitive-local",
"-S",
str(sam_file),
]

subprocess.run(_bowtie2_cmd)

return LatchFile(str(sam_file), "latch:///covid_assembly.sam")


@small_task
def sort_bam_task(sam: LatchFile) -> LatchFile:

bam_file = Path("covid_sorted.bam").resolve()

_samtools_sort_cmd = [
"samtools",
"sort",
"-o",
str(bam_file),
"-O",
"bam",
sam.local_path,
]

subprocess.run(_samtools_sort_cmd)

return LatchFile(str(bam_file), "latch:///covid_sorted.bam")


@workflow
def assemble_and_sort(read1: LatchFile, read2: LatchFile) -> LatchFile:
"""Description...
markdown header
----
Write some documentation about your workflow in
markdown here:
> Regular markdown constructs work as expected.
# Heading
* content1
* content2
__metadata__:
display_name: Assemble and Sort FastQ Files
author:
name:
email:
github:
repository:
license:
id: MIT
Args:
read1:
Paired-end read 1 file to be assembled.
__metadata__:
display_name: Read1
read2:
Paired-end read 2 file to be assembled.
__metadata__:
display_name: Read2
"""
sam = assembly_task(read1=read1, read2=read2)
return sort_bam_task(sam=sam)
```

## What happens at registration?

Now that we've defined our functions, we are ready to register our workflow with
the [LatchBio](https://latch.bio) platform. This will give us:

* a no-code interface
* managed cloud infrastructure for workflow execution
* a dedicated API endpoint for programmatic execution
* hosted documentation
* parallelized CSV-to-batch execution

To register, we type `latch register <directory_name>` into our terminal (where
directory_name is the name of the directory holding our code, Dockerfile and
version file).

The registration process requires a local installation of Docker.

To re-register changes, make sure you update the value in the version file. (The
value of the version is not important, only that it is distinct from previously
registered versions).
24 changes: 3 additions & 21 deletions docs/source/basics/local_development.md
Original file line number Diff line number Diff line change
@@ -1,31 +1,13 @@
# Local Development

Executing workflows on the LatchBio platform is heavily encouraged for
consistent behavior.
Executing workflows on the LatchBio platform is heavily encouraged for consistent behavior.

Workflows often deal with enormous files that are too large for local
development environments and sometimes require computing resources that cannot
be accommodated by local machines or are just unavailable (eg. GPUs). Thus,
there are many cases when local executions with smaller files or reduced
resources may behave differently than on properly configured cloud
infrastructure. Local execution should never be a substitute for testing
workflow logic on the platform itself.
Workflows often deal with enormous files that are too large for local development environments and sometimes require computing resources that cannot be accommodated by local machines or are just unavailable (eg. GPUs). Thus, there are many cases when local executions with smaller files or reduced resources may behave differently than on properly configured cloud infrastructure. Local execution should never be a substitute for testing workflow logic on the platform itself.

However, the ability to quickly iterate and debug task logic locally is
certainly useful for teasing out many types of bugs.

Using a `if __name__ == "__main__":` clause is a useful way to tag local
function calls with sample values. Running `python3 wf/__init__.py` will become
an entrypoint for quick debugging.

To run the same entrypoint _within_ the latest registered container, one can run
`latch local-execute <PATH_TO_WORKFLOW_DIR>`. This gives the same confidence in
reproducible behavior one would usually receive post registration but with the
benefits of fast local development. Note that workflow code is
mounted inside the latest container build so that rebuilds are not consistently
made with rapid changes.

More information [here](https://docs.latch.bio/subcommands.html#latch-local-execute).
Using a `if __name__ == "__main__":` clause is a useful way to tag local function calls with sample values. Running `python3 wf/__init__.py` will become an entrypoint for quick debugging.

Here is an example of a minimal `wf/__init__.py` file that demonstrates local
execution:
Expand Down
Loading

0 comments on commit e18a420

Please sign in to comment.