zol (& fai)

zol (& fai): tools for targeted searching and evolutionary investigations of gene clusters (sets of co-located genes - e.g. biosynthetic gene clusters, viruses/phages, operons, etc.).

First, fai allows users to search for homologous/orthologous instances of a query gene cluster in a database of (meta-)genomes. There are some other similar tools, including convenient webservers, to fai (which we highlight and recommend as altneratives on this documentation page); but, fai also has some unique/rarer options. Mainly, fai pays special attention to see whether gene cluster hits in target (meta-)genomes are on scaffold/contig edges and takes consideration of this, during both detection and downstream assessment. E.g. fai will mark individual coding genes and gene cluster instances if they are on the edge of a scaffold/contig, which can then be used as a filter in zol. This is important for calculation of conservation of genes across homologous gene clusters!

After finding homologous instances of a gene cluster - using fai or other software - users often wish to investigate the similarity between instances. This is often performed using pairwise similarity assessment via visualization with tools such as clinker, gggenomes, etc. While these tools are great, if you found 100s or 1000s of gene cluster instances such visualizations can get overwhelming and computationally expensive to render. To simplify the identification of interesting functional, evolutionary, and conservation patterns across 100s to 1000s of homologous gene cluster instances, we developed zol to perform de novo ortholog group predictions and create detailed color-formatted XLSX spreadsheets summarizing information. More recently, we have also introduced scalable visualization tools (cgc & cgcg) that allow for simpler assessment of information represented across thousands of homologous gene cluster instances.

Citation:

zol & fai: large-scale targeted detection and evolutionary investigation of gene clusters. bioRxiv 2023. Rauf Salamzade, Patricia Q Tran, Cody Martin, Abigail L Manson, Michael S Gilmore, Ashlee M Earl, Karthik Anantharaman, Lindsay R Kalan

In addition, please cite important dependency software or databases for your specific analysis accordingly.

Main Contents:

Auxiliary tools within the suite:

Short Note on Resource Requirements:

Different programs in the zol suite have different resource requirements. Moving forward, the default settings in the zol program itself should usually allow for low memory usage and faster runtime. For thousands of gene cluster instances, we recommend to either use the dereplication/reinflation approach (see manuscript for comparison on evolutionary statistics between this approach and a full processing) or using CD-HIT clustering (a greedy incremental clustering approach - which is nicely illustrated/explained on the MMSeqs2 wiki) to determine protein clusters/families (not true ortholog groups). Disk space is generally not a huge concern for zol analysis, but if working with thousands of gene clusters things can temporarily get large.

Available disk space is the primary concern however for fai and prepTG. This is mostly the case for users interested in the construction and searching of large databases (containing over a thousand genomes). Generally, prepTG and fai are designed to work on metagenomic as well as genomic datasets and do not have a high memory usage, but genomic files stack up in space and DIAMOND alignment files can quite get large as well.

Installation:

Bioconda (Recommended):

Note, (for some setups at least) it is critical to specify the conda-forge channel before the bioconda channel to properly configure priority and lead to a successful installation.

Recommended: For a significantly faster installation process, use mamba in place of conda in the below commands, by installing mamba in your base conda environment.

# 1. install and activate zol
conda create -n zol_env -c conda-forge -c bioconda zol
conda activate zol_env

# 2. depending on internet speed, this can take 20-30 minutes
# end product will be ~40 GB! You can also run in minimal mode
# (which will only download Pfam & PGAP HMM models ~8.5 GB)
# using the -m argument. 
setup_annotation_dbs.py [-m]

Note

When you create a conda environment using -n, the environment will typically be stored in your home directory. However, because the databases can be large, you might prefer to instead setup the conda environment somewhere else with more space on your system using -p. For instance, conda create -p /path/to/drive_with_more_space/zol_conda_env/ -c conda-forge -c bioconda zol. Then, next time around you would simply activate this environment by providing the path to it: conda activate /path/to/drive_with_more_space/zol_conda_env/

Docker:

Requires docker to be installed on your system!

To keep the Docker image size relatively low (currently ~13 GB), only the Pfam and PGAP HMMs/databases are included.

# get wrapper script from GitHub
wget https://raw.githubusercontent.com/Kalan-Lab/zol/main/docker/run_ZOL.sh

# change permissions to allow execution
chmod a+x ./run_ZOL.sh

# run script
./run_ZOL.sh

Test case:

Following installation, you can run a provided test case focused on a subset of Enterococcal polysaccharide antigen instances in E. faecalis and E. faecium as such:

Bioconda:

# download test data tar.gz and bash script for running tests
wget https://github.com/Kalan-Lab/zol/raw/main/test_case.tar.gz
wget https://raw.githubusercontent.com/Kalan-Lab/zol/main/run_tests.sh

# run bash-based testing script
bash run_tests.sh

Docker:

# download test scripts from (bash script which you can reference for learning how to run zol).
wget https://raw.githubusercontent.com/Kalan-Lab/zol/main/docker/test_docker.sh

# change permissions to allow execution
chmod a+x ./test_docker.sh

# run tests
./test_docker.sh

Note, the script test_docker.sh must be run in the same folder as run_ZOL.sh!

License:

BSD 3-Clause License

Copyright (c) 2023, Kalan-Lab
All rights reserved.

Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are met:

1. Redistributions of source code must retain the above copyright notice, this
   list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice,
   this list of conditions and the following disclaimer in the documentation
   and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its
   contributors may be used to endorse or promote products derived from
   this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.

Name		Name	Last commit message	Last commit date
Latest commit History 379 Commits
bin		bin
bioconda_recipe		bioconda_recipe
db		db
docker		docker
scripts		scripts
src/zol		src/zol
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml
run_tests.sh		run_tests.sh
setup.py		setup.py
test_case.tar.gz		test_case.tar.gz
zol_env.yml		zol_env.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

zol (& fai)

Citation:

Main Contents:

Auxiliary tools within the suite:

Short Note on Resource Requirements:

Installation:

Bioconda (Recommended):

Docker:

Test case:

Bioconda:

Docker:

License:

About

Releases 41

Packages

Languages

License

Kalan-Lab/zol

Folders and files

Latest commit

History

Repository files navigation

zol (& fai)

Citation:

Main Contents:

Auxiliary tools within the suite:

Short Note on Resource Requirements:

Installation:

Bioconda (Recommended):

Docker:

Test case:

Bioconda:

Docker:

License:

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 41

Packages 0

Languages

Packages