Skip to content

Cactus 2.0.0 2021-06-18

Compare
Choose a tag to compare
@glennhickey glennhickey released this 18 Jun 20:01
· 1774 commits to master since this release
c489c08

Cactus 2.0.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release includes a major update to the Cactus workflow which should dramatically improve both speed and robustness. Previously, Cactus used a multiprocess architecture for all cactus graph operations (everything after the "blast" phase). Each process was run in its own Toil job, and they would communicate via the CactusDisk database that ran as its own separate service process (ktserver by default). Writing to and from the database was often a bottleneck, and it would fail sporadically on larger inputs with frustrating "network errors". This has all now been changed to run as a single multithreaded executable, cactus_consolidated. Apart from saving on database I/O, cactus_consolidated now uses the much-faster, SIMD-accelerated abPOA by default instead of cPecan for performing multiple sequence alignments within the BAR phase.

Cactus was originally designed for a heterogeneous compute environment where a handful of large memory machines ran a small number of jobs, and much of the compute could be farmed off to a large number of smaller machines. While lastz jobs from the "preprocess" and "blast" phases (or cactus-preprocess and cactus-blast) can still be farmed out to smaller machines, the rest of cactus (cactus-align) can now only be run on more powerful systems. The exact requirements depend as usual on genome size and divergence, but roughly 64 cores / 512G RAM are required for distant mammals.

This release also contains several fixes and usability improvements for the pangenome pipeline, and finally includes halPhyloP.

Changlelog:

  • Fold all post-blast processing into single binary executable,cactus_consolidated
  • New option, --consCores, to control the number of threads for each cactus_consolidated process.
  • Cactus database (ktserver) no longer used.
  • abPOA now default base aligner, replacing cPecan
  • cPecan updated to include multithreading support via MUM anchors (as opposed to spawning lastz processes), and can be toggled on in the config
  • Fix bug in how cactus-prepare transmits Toil size parameters
  • cactus-prepare-join tool added to combine and index chromosome output from cactus-align-batch
  • cactus-graphmap-split fixes
  • Update to latest Segalign
  • Update to Toil 5.3
  • Update HAL
  • Add halPhyloP to binary release and docker images