Skip to content

Releases: ComparativeGenomicsToolkit/cactus

Cactus 2.2.0 2022-08-19

19 Aug 15:04
84b9d68
Compare
Choose a tag to compare

Cactus 2.2.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release contains a major update to the "blast" phase, where chaining logic is introduced to select lastz anchors, replacing the old quality-based heuristic. It also uses 1 fewer outgroup (2 instead of 3) by default, and no longer explicitly computes self-alignments, which should result in faster runtimes.

Other changes include:

  • Complete rewrite and drastic simplification of all code used to genereate lastz anchors
  • PAF format now used natively throughout Cactus (replacing lastz cigars)
  • Major refactor and cleanup of the "progressive" python module, removing vestiges of old Progressive Cactus repo
  • Rewrite and simplifcation of the "Cactus Workflow" Python code.
  • Intermediate files (project, multicactus project, experiment XML) all done away with.
  • More explicit error message for "illegal instruction" signal (which commonly confused people trying to run on older CPUs)
  • Fasta contig name checking and prefixing done at beginning of each tool (this should prevent cryptic halAppendSubtree errors in the pangenome pipeline)
  • Update to newest SegAlign, which should fix an overflow bug that occurs when repeatmasking some data.
  • Increase binary compatibility by linking with newer libxml2
  • Add cactus-terra-helper tool to force-resume Terra workflows (when its own call caching fails)
  • Small cleanup of cactus-preprocess interface

Cactus 2.1.1 2022-06-15

15 Jun 12:37
999c826
Compare
Choose a tag to compare

Cactus 2.1.1 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release includes:

  • Update Segalign to fix crash while lastz-repeatmasking certain (fragmented?) assemblies using GPUs.
  • Add cactus-update-prepare which generates scripts for updating HAL alignments (Thanks @thiagogenez)
  • Upgrade release (CPU) Docker images from Ubuntu 18.04 to Ubuntu 22.04.
  • Upgrade release GPU Docker image from Ubuntu 18.04 / Cuda 10.2 to Ubuntu 20.04 / Cuda 11.4.3 (the most recent Cuda currently supported by Terra)

Cactus 2.1.0 2022-06-02

02 Jun 16:28
f887107
Compare
Choose a tag to compare

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release introduces a major overhaul to the Minigraph-Cactus Pangenome Pipeline, including:

  • Total documentation rewrite (doc/pangenome.md) with more explanations, a new example (data included) for yeast, and detailed instructions that exactly reproduce a HPRC pangenome.
  • Incorporation of latest minigraph version that can write base alignments. These alignments, via GAF cigars, are now used by the Minigraph-Cactus pipeline rather than the raw minimizers.
  • Masking with dna-brnn is no longer needed or recommended (but it is still supported). Instead, a graph with the full sequences is constructed and any trimming is done based on the alignment in postprocessing (via cactus-graphmap-join).
  • Better Continuous Integration testing for the entire pangenome pipeline.

Graphs constructed with the new, simpler pipeline should be slightly more accurate and much cleaner.

Other changes include:

  • Fix bug in Cactus (since v2.0) that sometimes caused spurious tiny self-alignments.
  • Update to newer version of abPOA (improves stability, and some corner case accuracy)
  • Fix Dockerfile so that Cactus Docker images are now much (5X) smaller.
  • Fix HAL support for remote files in Cactus Docker images.
  • Update HAL library to patched version that works for alignment updates (as described in doc/updating-alignments.md)

The --gpu option still doesn't always work. When using the GPU outside the gpu Docker Release, it is still advised to set gpuLastz="true" in src/cactus/cactus_progressive_config.xml (and rerun pip install -U).

Cactus 2.0.5 2022-01-25

25 Jan 16:16
f1eef40
Compare
Choose a tag to compare

Cactus 2.0.5 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release fixes fixes a major (though rare) bug where the reference phase could take forever on some inputs. It includes a newer version of lastz which seems to fix some crashes as well.

  • Debug symbols no longer stripped from cactus_consolidated binary in Release.
  • Update to Toil 3.5.6
  • Update examples to use Python 3.8 specifically (was previously just python3, but this is often python3.6, support for which was dropped in Toil 3.5.6)
  • cactus-prepare WDL output can now batch up hal_append_subtree jobs
  • Fix bug where "reference" phase within cactus_consolidated could take ages on some input
  • Fix bug where --realTimeLogging flag would cause infinite loop after cactus_consolidated within some Docker invocations.
  • Upgrade to more recent version of lastz

Cactus 2.0.4 2021-11-12

12 Nov 18:37
eca7219
Compare
Choose a tag to compare

Cactus 2.0.4 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release fixes a bug introduced in version 2.0.0 where ancestral sequences could not be specified in the input, which prevented the recommended producedure for updating existing alignments from working.

  • Fix Assertion cap_getSequence(cap) == sequence' failed error when ancestral fasta provided in input seqfile.
  • Several minor pangenome updates, mostly in cactus-graphmap-join

Cactus 2.0.3 2021-07-22

22 Jul 21:48
8aca6e5
Compare
Choose a tag to compare

Cactus 2.0.3 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release fixes some issues in pangenome normalization and CAF running time.

  • Fix new regression that caused CAF's secondary filter to sometimes take forever. This code has been causing occaisional slowdowns for some time, but should finally be fixed once and for all.
  • Fix cactus-preprocess to work on zipped fasta inputs even when not running dna-brnn.
  • Fix normalization in cactus-graphmap-join
  • Update to abPOA v1.2.5

Cactus 2.0.2 2021-07-07

07 Jul 17:14
95fc1b9
Compare
Choose a tag to compare

Cactus 2.0.2 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release primarily addresses stability issues during pangenome construction.

Changelog:

  • Use latest abpoa, which fixes bug where aligning >1024 sequences would lead to a segfault
  • Update to Toil 5.4.0
  • More consistently apply filters to minimap2 output in the fallback stage of graphmap-split
  • Build abpoa with AVX2 SIMD extensions instead of SSE4.1 in order to work around instability when building pangenomes. This ups the hardware requirements for releases, unfortunately, as AVX2 is slightly newer.
  • Clean up CAF config parameters
  • Fix CAF secondary filter worst-case runtime issue. It was very rare but could add days to runtime.
  • Slightly tune minimap2 thresholds used for chromosome splitting
  • Normalization option added to cactus-graphmap-join (should be used to work around soon-to-be addressed underalignment bug)

Cactus 2.0.1 2021-06-19

19 Jun 23:49
dd5a058
Compare
Choose a tag to compare

Cactus 2.0.1 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This a patch release that fixes an issue where the new --consCores option could not be used with --maxCores (Thanks @RenzoTale88). It also reverts some last minute CAF parameter changes to something more tested (though known to be slow in some cases with large numbers of secondary alignments)

Changelog:

  • Fix bug where cactus doesn't work when both --maxCores and --consCores are specified.
  • Static binaries script more portable.
  • Revert CAF trimming parameters to their previous defaults.

Cactus 2.0.0 2021-06-18

18 Jun 20:01
c489c08
Compare
Choose a tag to compare

Cactus 2.0.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release includes a major update to the Cactus workflow which should dramatically improve both speed and robustness. Previously, Cactus used a multiprocess architecture for all cactus graph operations (everything after the "blast" phase). Each process was run in its own Toil job, and they would communicate via the CactusDisk database that ran as its own separate service process (ktserver by default). Writing to and from the database was often a bottleneck, and it would fail sporadically on larger inputs with frustrating "network errors". This has all now been changed to run as a single multithreaded executable, cactus_consolidated. Apart from saving on database I/O, cactus_consolidated now uses the much-faster, SIMD-accelerated abPOA by default instead of cPecan for performing multiple sequence alignments within the BAR phase.

Cactus was originally designed for a heterogeneous compute environment where a handful of large memory machines ran a small number of jobs, and much of the compute could be farmed off to a large number of smaller machines. While lastz jobs from the "preprocess" and "blast" phases (or cactus-preprocess and cactus-blast) can still be farmed out to smaller machines, the rest of cactus (cactus-align) can now only be run on more powerful systems. The exact requirements depend as usual on genome size and divergence, but roughly 64 cores / 512G RAM are required for distant mammals.

This release also contains several fixes and usability improvements for the pangenome pipeline, and finally includes halPhyloP.

Changlelog:

  • Fold all post-blast processing into single binary executable,cactus_consolidated
  • New option, --consCores, to control the number of threads for each cactus_consolidated process.
  • Cactus database (ktserver) no longer used.
  • abPOA now default base aligner, replacing cPecan
  • cPecan updated to include multithreading support via MUM anchors (as opposed to spawning lastz processes), and can be toggled on in the config
  • Fix bug in how cactus-prepare transmits Toil size parameters
  • cactus-prepare-join tool added to combine and index chromosome output from cactus-align-batch
  • cactus-graphmap-split fixes
  • Update to latest Segalign
  • Update to Toil 5.3
  • Update HAL
  • Add halPhyloP to binary release and docker images

Cactus 1.3.0 2021-02-11

12 Feb 00:29
dec2b45
Compare
Choose a tag to compare

Cactus 1.3.0 is available in the following forms:

WARNING: do not use the github automatically generated source files (Source code (zip) or Source code (tar.gz)), these are not correct.

The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).

Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.

Release notes

This release introduces the Cactus Pangenome Pipeline, which can be used to align together samples from the same species in order to create a pangenome graph:

  • cactus_bar now has a POA-mode via the abpoa aligner, which scales better than Pecan for large numbers of sequences and is nearly as accurate if the sequences are highly similar
  • cactus-refmap tool added to produce cactus alignment anchors with all-to-reference minimap2 alignments instead of all-to-all lastz
  • cactus-graphmap tool added to produce cactus alignment anchors with all-to-reference-graph minigraph alignments instead of all-to-all lastsz
  • --maskAlpha option added to cactus-preprocess to softmask (or clip out) satellite sequence using `dna-brnn.
  • cactus_bar now has an option to ignore masked sequence with a given length threshold.
  • cactus-graphmap-split tool added to split input fasta sequences into chromosomes using minigraph alignments in order to create alignment subproblems and improve scaling.
  • cactus-align-batch tool added to align several chromsomes at once using one job per chromosome. (--batch option added to cactus-align to achieve the same using many jobs per chromosome)
  • --outVG and outGFA options added to cactus-align to output pangenome graphs in addtion to hal.

Other changes:

  • cactus-prepare scheduling bug fix
  • --database redis option added to use Redis instead of Kyoto Tycoon
  • cactus-blast --restart bug fix
  • "Legacy" binary release provided for those whose hardware is too old to run the normal release.