** Last updated: May 18, 2021*
FunGAP is freely available for academic use. For the commerical use or license of FunGAP, please contact In-Geol Choi (email: igchoi (at) korea.ac.kr). Please, cite the following reference
Reference: Byoungnam Min Igor V Grigoriev In-Geol Choi, FunGAP: Fungal Genome Annotation Pipeline using evidence-based gene model evaluation (2017), Bioinformatics, Volume 33, Issue 18, Pages 2936–2937, https://doi.org/10.1093/bioinformatics/btx353
Please don't hesitate to post on Issues or contact me (mbnmbn00@gmail.com) for help. These steps were tested in the freshly installed Ubuntu 20.04.2 LTS.
Using Docker is the most reliable and robust way to install FunGAP. Please follow the instruction.
Although we recommend using Docker, some workspaces are not available for Docker (e.g., HPC). Please use the following instruction for conda-based FunGAP installation.
- Hisat2 v2.2.1
- Trinity v2.12.0
- RepeatModeler v2.0.1
- Maker v3.01.03
- GeneMark-ES/ET v4.65_lic
- Augustus v3.4.0
- Braker v2.1.5
- BUSCO v5.1.2
- Pfam_scan v1.6
- BLAST v2.11.0
- Samtools v1.10
- Bamtools v2.5.1
- Pfam release 34.0
Download and install Anaconda3 (We assume that you install it in $HOME/anaconda3
)
# Download and install conda
cd $HOME
wget https://repo.anaconda.com/archive/Anaconda3-2021.05-Linux-x86_64.sh
bash Anaconda3-2021.05-Linux-x86_64.sh
# Set environment if you select "no" to "Do you wish the installer to initialize Anaconda3?"
echo ". $HOME/anaconda3/etc/profile.d/conda.sh" >> ~/.bashrc
source $HOME/.bashrc
which conda # It should be $HOME/anaconda3/condabin/conda
# Get up-to-date conda
conda update conda
# Install Mamba package manager (faster!)
conda install mamba -n base -c conda-forge
# Create FunGAP environment and install dependencies using Mamba
conda create -y -n fungap
conda activate fungap
mamba install \
braker2=2.1.5 trinity=2.12.0 repeatmodeler=2.0.1 hisat2=2.2.1 pfam_scan=1.6 busco=5.1.2 \
-c bioconda -c conda-forge
# Install Python and Perl modules (within fungap environment)
pip install biopython bcbio-gff markdown2 matplotlib
cpanm YAML Hash::Merge Logger::Simple Parallel::ForkManager MCE::Mutex Thread::Queue threads
# Install Maker using Mamba (Maker installation is conflict with Busco)
conda deactivate
conda create -y -n maker
conda activate maker
mamba install maker=3.01.03 -c bioconda -c conda-forge
Download FunGAP using GitHub clone. Suppose we are installing FunGAP in your $HOME
directory, but you are free to change the location. $FUNGAP_DIR
is going to be your FunGAP installation directory.
cd $HOME # or wherever you want
git clone https://github.com/CompSynBioLab-KoreaUniv/FunGAP.git
export FUNGAP_DIR=$(realpath FunGAP/)
# You can put this export command in the your .bashrc file
# so that you don't need to type every time you run the FunGAP
Download Pfam databases in your $FUNGAP_DIR/db
directory.
ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release
mkdir -p $FUNGAP_DIR/db/pfam
cd $FUNGAP_DIR/db/pfam
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.gz
wget ftp://ftp.ebi.ac.uk/pub/databases/Pfam/current_release/Pfam-A.hmm.dat.gz
gunzip Pfam-A.hmm.gz Pfam-A.hmm.dat.gz
conda activate fungap
hmmpress Pfam-A.hmm # HMMER package (would be automatically installed in the above Anaconda step)
Go to the below site and download GeneMark-ES/ET. http://topaz.gatech.edu/GeneMark/license_download.cgi Don't forget to download the key, too.
mkdir $FUNGAP_DIR/external/
mv gmes_linux_64.tar.gz gm_key_64.gz $FUNGAP_DIR/external/ # Move your downloaded files to this directory
cd $FUNGAP_DIR/external/
tar -zxvf gmes_linux_64.tar.gz
gunzip gm_key_64.gz
cp gm_key_64 ~/.gm_key
GeneMark forces to use /usr/bin/perl
instead of conda-installed perl. You can change this by running change_path_in_perl_scripts.pl
script.
cd $FUNGAP_DIR/external/gmes_linux_64_4/
perl change_path_in_perl_scripts.pl "/usr/bin/env perl"
cd $FUNGAP_DIR/external/gmes_linux_64_4/
./gmes_petap.pl
conda activate fungap
cd $(dirname $(which RepeatMasker))/../share/RepeatMasker
# ./configure command will download required databases
echo -e "\n2\n$(dirname $(which rmblastn))\n\n5\n" > tmp && ./configure < tmp
# It should look like this
ls $(dirname $(which RepeatMasker))/../share/RepeatMasker/Libraries
# Artefacts.embl Dfam.hmm RepeatAnnotationData.pm RepeatMasker.lib.nin RepeatPeps.lib RepeatPeps.lib.psq
# CONS-Dfam_3.0 README.meta RepeatMasker.lib RepeatMasker.lib.nsq RepeatPeps.lib.phr RepeatPeps.readme
# Dfam.embl RMRBMeta.embl RepeatMasker.lib.nhr RepeatMaskerLib.embl RepeatPeps.lib.pin taxonomy.dat
This script allows users to set and test (by --help command) all the dependencies. If this script runs without any issue, you are ready to run FunGAP!
cd $FUNGAP_DIR
conda activate maker
export MAKER_DIR=$(dirname $(which maker))
echo $MAKER_DIR # /home/ubuntu/anaconda3/envs/maker/bin
conda activate fungap
./set_dependencies.py \
--pfam_db_path db/pfam/ \
--genemark_path external/gmes_linux_64_4/ \
--maker_path ${MAKER_DIR}
You can download yeast (Saccharomyces cerevisiae) genome assembly (FASTA) and RNA-seq reads (two FASTQs) from NCBI for testing FunGAP.
# Download RNA-seq reads using SRA toolkit (https://github.com/ncbi/sra-tools/wiki/01.-Downloading-SRA-Toolkit)
# Parameter -X indicates the number of read pairs you want to download
fastq-dump -X 1000000 -I --split-files SRR1198667
# Download assembly
wget ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/146/045/GCF_000146045.2_R64/GCF_000146045.2_R64_genomic.fna.gz
gunzip GCF_000146045.2_R64_genomic.fna.gz
conda activate fungap # if you didn't do it already
$FUNGAP_DIR/download_sister_orgs.py \
--taxon "Saccharomyces cerevisiae" \
--email_address <YOUR_EMAIL_ADDRESS> \
--num_sisters 1
zcat sister_orgs/*faa.gz > prot_db.faa
conda activate fungap # if you didn't do it already
$FUNGAP_DIR/get_augustus_species.py \
--genus_name "Saccharomyces" \
--email_address byoungnammin@lbl.gov
- saccharomyces_cerevisiae_S288C
conda activate fungap # if you didn't do it already
$FUNGAP_DIR/fungap.py \
--genome_assembly GCF_000146045.2_R64_genomic.fna \
--trans_read_1 SRR1198667_1.fastq \
--trans_read_2 SRR1198667_2.fastq \
--augustus_species saccharomyces_cerevisiae_S288C \
--busco_dataset ascomycota_odb10 \
--sister_proteome prot_db.faa \
--num_cores 8
The FunGAP predicted ~5500 genes in my test run (fungap_out/fungap_out
output directory). It took about 8 hours by Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz with 8 CPU cores.