Releases: ComparativeGenomicsToolkit/cactus
Cactus 2.2.0 2022-08-19
Cactus 2.2.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.2.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.2.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.2.0.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.2.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.2.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release contains a major update to the "blast" phase, where chaining logic is introduced to select lastz anchors, replacing the old quality-based heuristic. It also uses 1 fewer outgroup (2 instead of 3) by default, and no longer explicitly computes self-alignments, which should result in faster runtimes.
Other changes include:
- Complete rewrite and drastic simplification of all code used to genereate lastz anchors
- PAF format now used natively throughout Cactus (replacing lastz cigars)
- Major refactor and cleanup of the "progressive" python module, removing vestiges of old Progressive Cactus repo
- Rewrite and simplifcation of the "Cactus Workflow" Python code.
- Intermediate files (project, multicactus project, experiment XML) all done away with.
- More explicit error message for "illegal instruction" signal (which commonly confused people trying to run on older CPUs)
- Fasta contig name checking and prefixing done at beginning of each tool (this should prevent cryptic
halAppendSubtree
errors in the pangenome pipeline) - Update to newest SegAlign, which should fix an overflow bug that occurs when repeatmasking some data.
- Increase binary compatibility by linking with newer libxml2
- Add
cactus-terra-helper
tool to force-resume Terra workflows (when its own call caching fails) - Small cleanup of
cactus-preprocess
interface
Cactus 2.1.1 2022-06-15
Cactus 2.1.1 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.1.1
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.1.1-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.1.1.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.1.1.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.1.1.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release includes:
- Update Segalign to fix crash while lastz-repeatmasking certain (fragmented?) assemblies using GPUs.
- Add cactus-update-prepare which generates scripts for updating HAL alignments (Thanks @thiagogenez)
- Upgrade release (CPU) Docker images from Ubuntu 18.04 to Ubuntu 22.04.
- Upgrade release GPU Docker image from Ubuntu 18.04 / Cuda 10.2 to Ubuntu 20.04 / Cuda 11.4.3 (the most recent Cuda currently supported by Terra)
Cactus 2.1.0 2022-06-02
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.1.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.1.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.1.0.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.1.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.1.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release introduces a major overhaul to the Minigraph-Cactus Pangenome Pipeline, including:
- Total documentation rewrite (doc/pangenome.md) with more explanations, a new example (data included) for yeast, and detailed instructions that exactly reproduce a HPRC pangenome.
- Incorporation of latest minigraph version that can write base alignments. These alignments, via GAF cigars, are now used by the Minigraph-Cactus pipeline rather than the raw minimizers.
- Masking with dna-brnn is no longer needed or recommended (but it is still supported). Instead, a graph with the full sequences is constructed and any trimming is done based on the alignment in postprocessing (via
cactus-graphmap-join
). - Better Continuous Integration testing for the entire pangenome pipeline.
Graphs constructed with the new, simpler pipeline should be slightly more accurate and much cleaner.
Other changes include:
- Fix bug in Cactus (since v2.0) that sometimes caused spurious tiny self-alignments.
- Update to newer version of abPOA (improves stability, and some corner case accuracy)
- Fix Dockerfile so that Cactus Docker images are now much (5X) smaller.
- Fix HAL support for remote files in Cactus Docker images.
- Update HAL library to patched version that works for alignment updates (as described in doc/updating-alignments.md)
The --gpu
option still doesn't always work. When using the GPU outside the gpu Docker Release, it is still advised to set gpuLastz="true" in src/cactus/cactus_progressive_config.xml (and rerun pip install -U
).
Cactus 2.0.5 2022-01-25
Cactus 2.0.5 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.0.5
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.0.5-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.5.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.5.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.0.5.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release fixes fixes a major (though rare) bug where the reference phase could take forever on some inputs. It includes a newer version of lastz which seems to fix some crashes as well.
- Debug symbols no longer stripped from
cactus_consolidated
binary in Release. - Update to Toil 3.5.6
- Update examples to use Python 3.8 specifically (was previously just python3, but this is often python3.6, support for which was dropped in Toil 3.5.6)
- cactus-prepare WDL output can now batch up
hal_append_subtree
jobs - Fix bug where "reference" phase within cactus_consolidated could take ages on some input
- Fix bug where --realTimeLogging flag would cause infinite loop after cactus_consolidated within some Docker invocations.
- Upgrade to more recent version of lastz
Cactus 2.0.4 2021-11-12
Cactus 2.0.4 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.0.4
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.0.4-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.4.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.4.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.0.4.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release fixes a bug introduced in version 2.0.0 where ancestral sequences could not be specified in the input, which prevented the recommended producedure for updating existing alignments from working.
- Fix
Assertion cap_getSequence(cap) == sequence' failed
error when ancestral fasta provided in input seqfile. - Several minor pangenome updates, mostly in
cactus-graphmap-join
Cactus 2.0.3 2021-07-22
Cactus 2.0.3 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.0.3
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.0.3-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.3.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.3.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.0.3.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release fixes some issues in pangenome normalization and CAF running time.
- Fix new regression that caused CAF's secondary filter to sometimes take forever. This code has been causing occaisional slowdowns for some time, but should finally be fixed once and for all.
- Fix cactus-preprocess to work on zipped fasta inputs even when not running dna-brnn.
- Fix normalization in cactus-graphmap-join
- Update to abPOA v1.2.5
Cactus 2.0.2 2021-07-07
Cactus 2.0.2 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.0.2
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.0.2-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.2.tar.gz
Install instructions in BIN-INSTALL.md - Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.2.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.0.2.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built using AVX2 extensions, and require a CPU that supports them, except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release primarily addresses stability issues during pangenome construction.
Changelog:
- Use latest abpoa, which fixes bug where aligning >1024 sequences would lead to a segfault
- Update to Toil 5.4.0
- More consistently apply filters to minimap2 output in the fallback stage of graphmap-split
- Build abpoa with AVX2 SIMD extensions instead of SSE4.1 in order to work around instability when building pangenomes. This ups the hardware requirements for releases, unfortunately, as AVX2 is slightly newer.
- Clean up CAF config parameters
- Fix CAF secondary filter worst-case runtime issue. It was very rare but could add days to runtime.
- Slightly tune minimap2 thresholds used for chromosome splitting
- Normalization option added to cactus-graphmap-join (should be used to work around soon-to-be addressed underalignment bug)
Cactus 2.0.1 2021-06-19
Cactus 2.0.1 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.0.1
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.0.1-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.1.tar.gz
- Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.1.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.0.1.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This a patch release that fixes an issue where the new --consCores
option could not be used with --maxCores
(Thanks @RenzoTale88). It also reverts some last minute CAF parameter changes to something more tested (though known to be slow in some cases with large numbers of secondary alignments)
Changelog:
- Fix bug where
cactus
doesn't work when both--maxCores
and--consCores
are specified. - Static binaries script more portable.
- Revert CAF trimming parameters to their previous defaults.
Cactus 2.0.0 2021-06-18
Cactus 2.0.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v2.0.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v2.0.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v2.0.0.tar.gz
- Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v2.0.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v2.0.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release includes a major update to the Cactus workflow which should dramatically improve both speed and robustness. Previously, Cactus used a multiprocess architecture for all cactus graph operations (everything after the "blast" phase). Each process was run in its own Toil job, and they would communicate via the CactusDisk database that ran as its own separate service process (ktserver by default). Writing to and from the database was often a bottleneck, and it would fail sporadically on larger inputs with frustrating "network errors". This has all now been changed to run as a single multithreaded executable, cactus_consolidated
. Apart from saving on database I/O, cactus_consolidated
now uses the much-faster, SIMD-accelerated abPOA by default instead of cPecan for performing multiple sequence alignments within the BAR phase.
Cactus was originally designed for a heterogeneous compute environment where a handful of large memory machines ran a small number of jobs, and much of the compute could be farmed off to a large number of smaller machines. While lastz jobs from the "preprocess" and "blast" phases (or cactus-preprocess
and cactus-blast
) can still be farmed out to smaller machines, the rest of cactus (cactus-align
) can now only be run on more powerful systems. The exact requirements depend as usual on genome size and divergence, but roughly 64 cores / 512G RAM are required for distant mammals.
This release also contains several fixes and usability improvements for the pangenome pipeline, and finally includes halPhyloP
.
Changlelog:
- Fold all post-blast processing into single binary executable,
cactus_consolidated
- New option,
--consCores
, to control the number of threads for eachcactus_consolidated
process. - Cactus database (ktserver) no longer used.
- abPOA now default base aligner, replacing cPecan
- cPecan updated to include multithreading support via MUM anchors (as opposed to spawning lastz processes), and can be toggled on in the config
- Fix bug in how
cactus-prepare
transmits Toil size parameters cactus-prepare-join
tool added to combine and index chromosome output fromcactus-align-batch
cactus-graphmap-split
fixes- Update to latest Segalign
- Update to Toil 5.3
- Update HAL
- Add
halPhyloP
to binary release and docker images
Cactus 1.3.0 2021-02-11
Cactus 1.3.0 is available in the following forms:
- Docker Image:
quay.io/comparative-genomics-toolkit/cactus:v1.3.0
GPU-accelerated Docker Image:quay.io/comparative-genomics-toolkit/cactus:v1.3.0-gpu
Install instructions in README.md - Pre-compiled Binaries Linux Tarball: cactus-bin-v1.3.0.tar.gz
- Pre-compiled Binaries For Older CPU Architectures (no pangenome support) Linux Tarball: cactus-bin-legacy-v1.3.0.tar.gz
Install instructions in BIN-INSTALL.md - Source Tarball: cactus-v1.3.0.tar.gz
Install instructions in README.md
WARNING: do not use the github automatically generated source files (Source code (zip)
or Source code (tar.gz)
), these are not correct.
The Docker images and binaries linked above are built for the Intel Nehalem architecture and require a CPU that supports it (typically something from 2008 or later), except the "Pre-compiled Binaries For Older CPU Architectures" which should be compatible with any 64-bit architecture (but don't yet support the Cactus's pangenome pipeline).
Please subscribe to the cactus-announce low-volume mailing list to receive notice of Cactus release.
Release notes
This release introduces the Cactus Pangenome Pipeline, which can be used to align together samples from the same species in order to create a pangenome graph:
cactus_bar
now has a POA-mode via the abpoa aligner, which scales better than Pecan for large numbers of sequences and is nearly as accurate if the sequences are highly similarcactus-refmap
tool added to produce cactus alignment anchors with all-to-reference minimap2 alignments instead of all-to-all lastzcactus-graphmap
tool added to produce cactus alignment anchors with all-to-reference-graph minigraph alignments instead of all-to-all lastsz--maskAlpha
option added tocactus-preprocess
to softmask (or clip out) satellite sequence using `dna-brnn.cactus_bar
now has an option to ignore masked sequence with a given length threshold.cactus-graphmap-split
tool added to split input fasta sequences into chromosomes using minigraph alignments in order to create alignment subproblems and improve scaling.cactus-align-batch
tool added to align several chromsomes at once using one job per chromosome. (--batch
option added tocactus-align
to achieve the same using many jobs per chromosome)--outVG
andoutGFA
options added tocactus-align
to output pangenome graphs in addtion to hal.
Other changes:
cactus-prepare
scheduling bug fix--database redis
option added to use Redis instead of Kyoto Tycooncactus-blast --restart
bug fix- "Legacy" binary release provided for those whose hardware is too old to run the normal release.