Pipeline for plant organites (chloroplast and mitochondrion) assembly
The following dependencies need to be installed manually:
The following dependencies can be installed using conda (recommended):
Clone the bash-common
repository then install the bash library (tag: mitology) system-wide into /usr/local
under share
directory (need to have at least sudoer privileges):
git clone https://github.com/jos4uke/bash-common.git
cd bash-common
sudo bash install.sh mitology /usr/local
Clone the log4sh
repository into /usr/loca/share
(need at least to have sudoer privileges):
cd /usr/local/share
sudo git clone https://github.com/kward/log4sh.git
cd /usr/local/share
sudo git clone https://github.com/jos4uke/mitology-pipeline.git
cd mitology-pipeline
# tree
├── bin # contains the main pipeline script
│ └── mitology-pipeline.sh
├── COPYING
├── README.md
└── share # contains the main pipeline script
└── mitology-pipeline
├── etc # contains the pipeline configuration file to be copied by the end user
│ └── mitology-pipeline_user.config
├── lib # contains specific bash library functions
│ ├── mitology-alignments_lib.inc
│ └── mitology-pipeline_lib.inc
└── scripts # helper scripts
├── plot_hashcount.sh
├── R
│ ├── length-weigthed_kmer_coverage_hist.rplot.R
│ └── plot_hashcount_hist.R
├── run_meta-velvetg.sh
└── split_files.sh
7 directories, 12 files
Steps:
- get a copy of the configuration file
- update the configuration file
- run the pipeline
cp share/mitology-pipeline/etc/mitology-pipeline_user.config .
The configuration file contains different sections:
- paths
- genome_alias
- sample
- khmer_load_into_counting
- khmer_filter_abund
- contig_assembler
- scaffolder
- velveth
- velvetg
- meta_velvetg
- quast
- bwa_aln
- bwa_sampe
- filtering
- samtools_view
- samtools_mpileup
- bcftools_view
- bgzip
In the [paths]
section, you need to update path to all resources listed as <key>=<value>
pair.
GENOMES_BASE_PATH
andINDEXES_BASE_PATH
are the root directory path to genomes files and tool index respectively.BWA_INDEXES
andSAMTOOLS_INDEXES
are the location of the bwa and samtools indexes directories respectively.bcftools
,bgzip
,bwa
,khmer_*
, etc. are the paths to the corresponding tools. They should be reachable from yourPATH
else please indicate the absolute path. It is recommended to install all these tools using conda.
In the [genome_alias]
, you need to update all prefixes/aliases to reference genomes used notably for indexes listed as <key>=<value>
pair. ref
and ref_short
aliases refer to the host genome, whereas ref_mito*
and ref_chloro*
aliases refer to the mitochondrial and chloroplast genomes or annotations (gff) respectively.
In the [sample]
section, you need to update the sample alias and paths listed as <key>=<value>
pair. name_alias
will be used to identify sample in the pipeline. And seqfile_parent_dir
is the parent directory location of the paired-end reads files specified by seqfile_{R1,R2}
keys.
The other sections refer to options specific to pipeline steps and tools, listed again as <key>=<value>
pair. Please refer to the tool documentation to update the options values.
Run the pipeline using the updated config file, and output results into results
directory.
bash bin/mitology-pipeline.sh -c mitology-pipeline_user.config -o results
Please checkout the usage help with bash mitology-pipeline.sh --help
todo:
- describe here all output files
- need some test dataset
IJPB Bioinformatics team