MMseqs2 Release 12-113e3
martin-steinegger
released this
01 Sep 11:22
·
535 commits
to master
since this release
Breaking changes
- Remove
--add-internal-id
parameter fromresult2msa
filterdb --shuffle
is now randomly instead of deterministically shuffled- Taxonomy expressions in filtertax(seq)db interpret
,
as||
now #320 convertalis
pident
output field now correctly reports percentage (0-100) sequence identity instead of fraction (0.00-1.00), usefident
to print the fraction instead
Features
- Support nucleotide clustering in
cluster
andeasy-cluster
- Support other architectures (SSE2/ARM64/POWER8/POWER9/etc) through SIMDe
- Linclust is much faster on systems with a lot of CPU cores
- Clustering update is faster, more stable and correctly deals with deleted sequences #272
- Add easy workflow for reciprocal best hit searches
easy-rbh
- Add SILVA, Pfam-B, dbCAN2 to
databases
databases
produces taxonomy information for NR- Replace old greedy incremental clustering with new memory efficient version
- Add
result2dnamsa
module to create MSAs of nucleotide sequences - Continued progress on profile-profile searching (
result2pp
,expandaln
,expand2profile
) , stay tuned! - Add multi-parameter to support to overwrite sequence type specific parameters: e.g.
--gap-open "nucl:5,aa:11"
- Add ORF information as output options to
convertalis
(qOrfStart/qOrfEnd, dbOrfStart, dbOrfEnd
) - Speed up sorting using ips4o
- Speed up masking through new version of tantan
- Speed up multi-threaded writing of clustering results
- Speed up reading of database indices and merging target split databases
- Add memory tracking to account for index size when computing available memory (
--split-memory-limit
should be more reliable when searching/clustering billions of sequences). - Add
--search-type 4
(translated/translated search) tocreateindex
- Add
convertalis --format-mode 3
HTML output based on MMseqs2 app (app.mmseqs.com) - Improve memory management in
result2msa
andresult2profile
modules - Add
msa2result
module to create an alignment result db from MSAs - Add
filterresult
to slim down result dbs with pairwise HHblits filtering #316 - Add
--kmers-per-sequence-scale
tolinsearch
to extract a k-mer fraction instead of a fixed count - Add a random integer to
--local-tmp
path to avoid race conditions if multiple MMseqs2 happen on the same machine - Add
--max-seqs
toungappedprefilter
- Add
--tax-lineage-mode 2
parameter to print numeric taxids
Bugs fixed
rbh
workflow was broken due to issues withfilterdb
- Fix
-a
in RBH search to show alignments - Fix PDB70 database creation in
databases
- Fix aria2c download support
- Fix memory issues and MPI in kmermatcher
- Fix memory issues in
extractorfs
when using AVX2 - Fix
--cluster-reassign
to respect--cov-mode
- Set-cover supports up to 2^32 sequences (previously crashed with more than 2^31)
- Exit correctly if there is not have enough disk space instead of crashing in the next module
- Fix
prefilter
order instability when searching very redundant databases - Correctly parse keys from data files in
filterdb --filter-file
, this was causing instability inlinsearch
- Allow overwriting string parameters with empty strings
- Fix ASAN issue in
extractorf
when using AVX2 - Microtar would try to seek backwards constantly resulting in horrible gzip read performance
- Avoid lookup writing to corrupt memory if an accession is too long
- Fix various inconsistencies and usability issues in
alignall
:--alignment-mode
inconsistent withalign
module--add-backtrace
did not do anything
- Fix restart of clusterings using reassignment
cluster --cluster-reassign
- Fix createdb did not correctly read gz/bzip files with
--createdb-mode 1
#323