Version 2.0
Version 2.0 adds a wide range of new features. Highlights:
- Python 3 is now required
- BLAT has been fully replaced with parasail (pairwise DNA alignments) or exonerate (protein-genome alignments)
- Support for updated clusterGenes that allows for genes to be considered not the same cluster even if they share a few bases of overlap. This is useful for compact genomes. This value can be modulated by the
--overlapping-gene-distance
flag, and defaults to 30 (exonic) bases. - New flags for controlling how de novo gene predictions are incorporated:
--denovo-ignore-novel-genes
: For de-novo predictions, discard any transcripts that are predicted to be novel genes. In other words, only retain putative novel isoforms.--denovo-novel-end-distance
: For de-novo predictions, allow transcripts to be included if they provide a novel 5' or 3' end N distance away from any existing ends. Default is 0.--denovo-allow-unsupported
: For de-novo predictions, allow novel isoforms to be called if they contain splices that are not supported by the reference annotation even if they are also not supported by RNA-seq. Without this flag, novel isoforms will only be called if they have one or more splice that has RNA-seq/IsoSeq support and no reference annotation support.--denovo-allow-bad-annot-or-tm
: For de-novo predictions, allow novel isoforms to be called that were flagged as BadAnnotOrTm. These predictions overlap instances where multiple genes transMapped to the same location with significant overlap, and so may be alignment mistakes, collapsed repeats or gene family collapse.
- GFF3 parsing is now more rigid. CAT only accepts GFF3 files that fit the required format. To help with this, new parsers have been placed in the programs folder that massage GenBank files from RefSeq and from GenBank, as well as GFF3 files produced by Prokka.
- You can test your GFF3 against the parser with the script
validate_gff3
. If your GFF3 passes this tool, it will work with CAT.