Release Cactus 2.0.0 2021-06-18 · ComparativeGenomicsToolkit/cactus

Cactus 2.0.0 is available in the following forms:

Docker Image: quay.io/comparative-genomics-toolkit/cactus:v2.0.0
GPU-accelerated Docker Image: quay.io/comparative-genomics-toolkit/cactus:v2.0.0-gpu
Install instructions in README.md
Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.0.tar.gz
Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.0.tar.gz
Install instructions in BIN-INSTALL.md
Source Tarball: cactus-v2.0.0.tar.gz
Install instructions in README.md

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release includes a major update to the Cactus workflow which should dramatically improve both speed and robustness. Previously, Cactus used a multiprocess architecture for all cactus graph operations (everything after the "blast" phase). Each process was run in its own Toil job, and they would communicate via the CactusDisk database that ran as its own separate service process (ktserver by default). Writing to and from the database was often a bottleneck, and it would fail sporadically on larger inputs with frustrating "network errors". This has all now been changed to run as a single multithreaded executable, cactus_consolidated. Apart from saving on database I/O, cactus_consolidated now uses the much-faster, SIMD-accelerated abPOA by default instead of cPecan for performing multiple sequence alignments within the BAR phase.

Cactus was originally designed for a heterogeneous compute environment where a handful of large memory machines ran a small number of jobs, and much of the compute could be farmed off to a large number of smaller machines. While lastz jobs from the "preprocess" and "blast" phases (or cactus-preprocess and cactus-blast) can still be farmed out to smaller machines, the rest of cactus (cactus-align) can now only be run on more powerful systems. The exact requirements depend as usual on genome size and divergence, but roughly 64 cores / 512G RAM are required for distant mammals.

This release also contains several fixes and usability improvements for the pangenome pipeline, and finally includes halPhyloP.

Changlelog:

Fold all post-blast processing into single binary executable,cactus_consolidated
New option, --consCores, to control the number of threads for each cactus_consolidated process.
Cactus database (ktserver) no longer used.
abPOA now default base aligner, replacing cPecan
cPecan updated to include multithreading support via MUM anchors (as opposed to spawning lastz processes), and can be toggled on in the config
Fix bug in how cactus-prepare transmits Toil size parameters
cactus-prepare-join tool added to combine and index chromosome output from cactus-align-batch
cactus-graphmap-split fixes
Update to latest Segalign
Update to Toil 5.3
Update HAL
Add halPhyloP to binary release and docker images

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cactus 2.0.0 2021-06-18

Release notes