THIS REPO HAS BEEN ARCHIVED
You can use the Zenodo DOI to cite this code:
Docker image available here
A pipeline for automated mitochondrial genome assembly using public data.
Dependencies:
- CAP3
- NOVOPlasty3.7.2
- MIRA4.0.2
- MITObim - Mitofree uses a slightly modified version of the script.
- MITOS
- Python2 and other software
- Python3
- sratoolkit2.10.0
All of the above dependencies can be easily installed through Bioconda. However, due to dependency conflicts (between Python 2 and 3, for instance), manual creation of a conda environment for Mitofree can be a little tricky. Thus, we encourage using the docker image to run this software. Even though this README has been written to be as accessible as possible, it is highly recommended to learn a bit about docker if you're not familiar with it.
1 - Install docker
docker pull gavieira/mitofree:latest
docker run --name mitofree -i -t -v ~:/mnt -w /mnt gavieira/mitofree /bin/bash
OBS: The contiainer has been created. Thus, the next time you need to run Mitofree, you can skip previous steps by simply starting the container and going straight to step 4. To start the container, run:
docker start -i mitofree
Basic usage:
nohup mitofree.py dataset_list.txt >mitofree.out 2>mitofree.err &
Mitofree's help message:
usage: mitofree.py [-h] [-S] [-M] [--novop_kmer] [--mitob_kmer] [-g] [-s] [-T]
FILENAME
Downloads sra NGS data and assembles mitochondrial contigs using NOVOPlasty
and MITObim
positional arguments:
FILENAME Path to file with multiple accessions (one per line)
optional arguments:
-h, --help show this help message and exit
-S, --savespace Automatically removes residual assembly files such as
fastq and mitobim iterations
-M , --maxmemory Limit of RAM usage for NOVOPlasty. Default: no limit
--novop_kmer K-mer used in NOVOPlasty assembly. Default: 39
--mitob_kmer K-mer used in MITObim assembly. Default: 73
-g , --gencode Genetic code table. Default: 2 (Vertebrate Mitochondrial)
-s , --subset Max number of reads used in the assembly process.
Default: 50 million reads
-T , --timeout Custom timeout for MITObim, in hours. Default: 24h
Please note the -M "--maxmemory" argument, that limits NOVOPlasty's RAM usage (in GB). If you are running this software from a machine with limited RAM available, you will want to set this option so that it won't use all your memory. For instance, if you have a 8GB computer, you may want to use "-M 7".
The -s "--subset" argument can be used to limit dataset size, which can also reduce RAM requirements. This argument can also be used to increase dataset size, which may be useful if you're having trouble in circularizing a mitogenome and got some RAM to spare.
Basically, this file consists of three tab-separated collumns, each with a specific information:
1-SRA_RUN_NUMBER 2-SPECIES_NAME 3-SEED_GENBANK_ACCESSION
For instance:
ERR1306022 Species1 MK297287
ERR7295165 Species2 MK297241
ERR1306034 Species3 MK291745
#SRR4409513 Species4 MK291678 #This assembly will be skipped
Each line corresponds to a different assembly. This way, you can build a list of as many organisms as you want and assemble their mitogenomes all at once. It is also possible to skip an assembly by adding a hash symbol (#) at the start of its corresponding line.