Skip to content

Metabarcoding analyses pipeline - metaBarcoding and Environmental DNA Analysis Tool

Notifications You must be signed in to change notification settings

Haoxiyang-bio/metaBEAT

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

metaBEAT - metaBarcoding and Environmental DNA analysis Tool

Reproducible pipeline for the analysis of metabarcoding data generated by either Sanger or NGS approaches.

metaBEAT is using a number of external programs. To make your life easier we have created a self contained environment with all necessary pieces of software in a docker image. This image is building on ReproPhylo. If you want to use it you'll need Docker installed on your machine.

How to use the Image:

Run the metaBEAT script in the container (you can process data in you current working directory or subdirectories of it):

sudo docker run --rm --net=host --name metaBEAT -v $(pwd):/home/working chrishah/metabeat metaBEAT_global.py -h

In a terminal window, mount the docker container to your current working directory and enter the self contained environment using a shell:

sudo docker run -i -t --net=host --name metaBEAT -v $(pwd):/home/working chrishah/metabeat /bin/bash

Or access the container via a Jupyter notebook, by simply running the start_metaBEAT_nb providing the full path to your desired mounting point to the script, e.g.:

./start_metaBEAT_nb $(pwd) --xt

This will open a Jupyter notebook in a new tab in your default browser. First it will notify you that your connection is not private. Click on Advanced on the bottom left and proceed to local host (unsafe). Then you will be asked to provide a password, which is simply password. Entering the password correctly will now open the Jupyter notebook and you are good to go.

Once you are done, you should stop the container by simply running:

stop_metaBEAT_nb

Within the environment you can then execute the scripts that come with metaBEAT, e.g.:

metaBEAT_global.py

Executing a script without any options will usually display the usage, e.g.:

usage: metaBEAT_global.py [-h] [-Q <FILE>] [-v] [-s] [-f] [-p] [-t] [-b]
                   [-m <string>] [-n <INT>] [-E] [-e] [--PCR_primer <FILE>]
                   [--trim_adapter <FILE>] [--trim_qual <INT>]
                   [--trim_window <INT>] [--trim_minlength <INT>] [--merge]
                   [--product_length <INT>] [--phred <INT>] [-R <FILE>]
                   [--gb_out <FILE>] [--rec_check] [--cluster]
                   [--clust_match <FLOAT>] [--clust_cov <INT>] [--www]
                   [--min_ident <FLOAT>] [--min_bit <INT>] [--refpkg <DIR>]
                   [-o OUTPUT_PREFIX] [--metadata METADATA] [--mock_meta_data]
                   [--version]

metaBEAT - metaBarcoding and Environmental DNA Analyses tool

optional arguments:
  -h, --help            show this help message and exit
  -Q <FILE>, --querylist <FILE>
                        file containing a list of query files
  -v, --verbose         turn verbose output on
  -s, --seqinfo         write out seq_info.csv file
  -f, --fasta           write out ref.fasta file
  -p, --phyloplace      perform phylogenetic placement
  -t, --taxids          write out taxid.txt file
  -b, --blast           compile local blast db and blast queries
  -m <string>, --marker <string>
                        marker ID (default: marker)
  -n <INT>, --n_threads <INT>
                        Number of threads (default: 1)
  -E, --extract_centroid_reads
                        extract centroid reads to files
  -e, --extract_all_reads
                        extract reads to files
  --version             show program's version number and exit

Query preprocessing:
  The parameters in this group affect how the query sequences are processed

  --PCR_primer <FILE>   PCR primers (provided in fasta file) to be clipped
                        from reads
  --trim_adapter <FILE>
                        trim adapters provided in file
  --trim_qual <INT>     minimum phred quality score (default: 30)
  --trim_window <INT>   sliding window size (default: 5) for trimming; if
                        average quality drops below the specified minimum
                        quality all subsequent bases are removed from the
                        reads
  --trim_minlength <INT>
                        minimum length of reads to be retained after trimming
                        (default: 50)
  --merge               attempt to merge paired-end reads
  --product_length <INT>
                        estimated length of PCR product (default: 100)
  --phred <INT>         phred quality score offset - 33 or 64 (default: 33)

Reference:
  The parameters in this group affect the reference to be used in the
  analyses

  -R <FILE>, --REFlist <FILE>
                        file containing a list of files to be used as
                        reference sequences
  --gb_out <FILE>       output the corrected gb file
  --rec_check           check records to be used as reference

Query clustering options:
  The parameters in this group affect read clustering

  --cluster             perform clustering of query sequences using vsearch
  --clust_match <FLOAT>
                        identity threshold for clustering in percent (default:
                        1)
  --clust_cov <INT>     minimum number of records in cluster (default: 1)

BLAST search:
  The parameters in this group affect BLAST search and BLAST based taxonomic
  assignment

  --www                 perform online BLAST search against nt database
  --min_ident <FLOAT>   minimum identity threshold in percent (default: 0.95)
  --min_bit <INT>       minimum bitscore (default: 80)

Phylogenetic placement:
  The parameters in this group affect phylogenetic placement

  --refpkg <DIR>        PATH to refpkg

BIOM OUTPUT:
  The arguments in this groups affect the output in BIOM format

  -o OUTPUT_PREFIX, --output_prefix OUTPUT_PREFIX
                        prefix for BIOM output files (default='metaBEAT')
  --metadata METADATA   comma delimited file containing metadata (optional)
  --mock_meta_data      add mock metadata to the samples in the BIOM output

VERSIONS

v. 0.6:

  • docker image for this version is: chrishah/metabeat:v0.6
  • used for Kitson et al. 2015

About

Metabarcoding analyses pipeline - metaBarcoding and Environmental DNA Analysis Tool

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages

  • Python 99.1%
  • Other 0.9%