TIPseqHunter

Dockerfile for TIPseqHunter pipeline

Getting Started

Motivation

Here we present the docker version of TIPseqHunter pipeline. This approach has the ability to encapsulate all java dependencies, read aligners, genome indexes and biological annotation files needed by both steps of the pipeline: TIPseqHunterPipelinejar.sh and [TipseqHunterPipelineJarSomatic.sh], once manipulating all these dependencies ends up being a little tricky depending on user expertise.

Prerequisities

In order to run this container you'll need docker installed.

Acquiring TIPseqHunter Image

Manual Installation

Clone this repository:

$ git clone https://github.com/galantelab/tipseq_hunter.git

TIPseqHunter needs biological annotation files that occupy a few gigabytes. In order to deal with these files, we created a gzipped tarball which is currently hosted in AWS. So, to successfully build the docker image, it is required to define the variable tarball_url, which may point to the tarball URL, to the docker build command.

Inside the tipseq_hunter folder, build the image:

$ docker build --build-arg tarball_url=https://bioinfohsl-webusers.s3.amazonaws.com/tmiller/tipseq_hunter_data.tar.gz -t tipseqhunter .

Another and better option is using the Makefile inside tipseq_hunter folder:

$ make build

Pulling Image

Pull tipseqhunter image from dockerhub registry:

$ docker pull galantelab/tipseqhunter

Or using Makefile:

$ make pull

Pay attention! You will need to use sudo in the commands if you are not member of the docker group

Usage

Once installed the docker image, the user may apply the Makefile, in order to automate the process of creating the container and running the pipeline, as well as using the ordinary docker run command.

Examples with docker run

By default the TIPseqHunterPipelinejar/TipseqHunterPipelineJarSomatic runs in a container-private folder. You need to change this using flags, like user (-u), current directory, and volumes (-w and -v). It is important to mount the fastq directory and output directory, that way docker can find the required files:

$ docker run \
	--rm \
	-u $(id -u):$(id -g) \
	-v path_to_fastq_folder:path_to_fastq_folder \
	-v path_to_output_folder:path_to_output_folder \
	-w path_to_output_folder \
	tipseqhunter \
		TIPseqHunterPipelineJar.sh path_to_fastq_folder path_to_output_folder fastq_r1 key_r1 key_r2 number_of_reads

That command sets the user UID:GID, mounts the input/ouput directories, sets the current working directory as the output folder and, finally, runs TIPseqHunterPipelinejar.sh script. In the end, the container is automatically removed.

The TIPseqHunterPipelinejar/TIPseqHunterPipelineJarSomatic runs based on some cutoffs. There is a default value to each one, but you might change it through environment variables. The best way to do it is by a configuration file to the docker run command. You can find an example in config.env file, which is already set to the default values. To use it with docker run:

$ docker run \
	--rm \
	--env-file=config.env \
	-u $(id -u):$(id -g) \
	-v path_to_fastq_folder:path_to_fastq_folder \
	-v path_to_output_folder:path_to_output_folder \
	-w path_to_output_folder \
	tipseqhunter \
		TIPseqHunterPipelineJar.sh path_to_fastq_folder path_to_output_folder fastq_r1 key_r1 key_r2 number_of_reads

Examples with Makefile

The Makefile can be used to build, pull and run the TIPseqHunter scripts inside docker:

$ make

help                           This help
build                          Build the image
build-nc                       Build the image without caching
pull                           Pull the latest tagged image from the dockerhub registry
remove                         Remove the lattest tagged image
run                            Run TIPseqHunter pipeline completely
run-pipeline                   Run TIPseqHunterPipelineJar.sh
run-pipeline-somatic           Run TIPseqHunterPipelineJarSomatic.sh
up                             Pull and run TIPseqHunter pipeline completely
stop                           Stop and remove a running container
version                        Output the current version

When running the pipeline, the Makefile automatically searches for a file named config.env in the current directory, so if it exists, you can just call:

$ make run

Or use another file with a different name:

$ make run CONFIG=another_config.txt

The arguments to TIPseqHunterPipelinejar/TIPseqHunterPipelineJarSomatic can be passed into the config.env or through the command line:

$ make run \
	CONFIG=another_config.txt \
	INPUT_DIR=fastq_folder \
	OUTPUT_DIR=ouput_folder \
	FASTQ_R1=example_R1.fa \
	KEY_R1=R1 \
	KEY_R2=R2 \
	READ_NUM=123456

That is it! 😄

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
bin		bin
hooks		hooks
lib		lib
thirdparty		thirdparty
.dockerignore		.dockerignore
.gitignore		.gitignore
Dockerfile		Dockerfile
Makefile		Makefile
README.md		README.md
config.env		config.env

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

TIPseqHunter

Getting Started

Motivation

Prerequisities

Acquiring TIPseqHunter Image

Manual Installation

Pulling Image

Usage

Examples with docker run

Examples with Makefile

About

Releases

Packages

Contributors 2

Languages

galantelab/tipseq_hunter

Folders and files

Latest commit

History

Repository files navigation

TIPseqHunter

Getting Started

Motivation

Prerequisities

Acquiring TIPseqHunter Image

Manual Installation

Pulling Image

Usage

Examples with docker run

Examples with Makefile

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages