Skip to content

Releases: zellerlab/GECCO

0.8.4

25 Sep 23:35
Compare
Choose a tag to compare

Fixed

  • gecco convert gbk --format bigslice failing to run because of outdated code (#5).
  • gecco convert gbk --format bigslice not creating files with names conforming to BiG-SLiCE expected input.

Changed

  • Bump minimum pyrodigal version to v0.6.2 to use platform-accelerated code if supported.

0.8.3-post1

23 Aug 21:51
Compare
Choose a tag to compare

Fixed

  • Wrong default value for --threshold being shown in gecco run help message.

0.8.3

23 Aug 10:18
Compare
Choose a tag to compare

Changed

  • Default probability threshold for segmentation to 0.3 (from 0.4).

0.8.2

31 Jul 10:42
Compare
Choose a tag to compare

Fixed

  • gecco run crashing on Python 3.6 because of missing contextlib.nullcontext class.

Changed

  • gecco run and gecco annotate will not try to count the number of profiles when given an external HMM file with the --hmm flag.
  • PyHMMER.run now reports the p-value of each domain in addition to the e-value as a /note qualifier.

0.8.1

29 Jul 17:00
Compare
Choose a tag to compare

Changed

  • gecco run now filters out unneeded features before annotating, making it easier to analyze the results of a run with a custom --model.

Fixed

  • gecco reporting about using Pfam v33.1 while actually using v34.0 because of an outdated field in gecco/hmmer/Pfam.ini.

Added

  • Missing documentation for the strand attribute of gecco.model.Gene.

0.8.0

03 Jul 13:11
Compare
Choose a tag to compare

Changed

  • Retrain internal model using new sequence embeddings and remove broken/duplicate BGCs from MIBiG 2.0.
  • Bump minimum pyhmmer version to v0.4.0 to improve exception handling.
  • Bump minimum pyrodigal version to v0.5.0 to fix sequence decoding on some platforms.
  • Use p-values instead of e-values to filter domains obtained with HMMER.
  • gecco cv and gecco train now seed the RNG with a user-defined seed before shuffling rows of training data.

Fixed

  • Extraction of BGC compositions for the type predictor while training.
  • ClusterCRF.trained failing to open an external model.

Added

  • Domain.pvalue attribute to access the p-value of a domain annotation.
  • Mandatory pvalue column to FeatureTable objects.
  • Support for loading several feature tables in gecco train and gecco cv.
  • Warnings to ClusterCRF.fit when selecting uninformative features.
  • --correction flag to gecco train and gecco cv, allowing to give a multiple testing correction method when computing p-values with the Fisher Exact Tests.

Removed

  • Outdated gecco embed command.
  • Unused --truncate flag from the gecco train CLI.
  • Tigrfam domains, which is not improving performance on the new training data.

0.7.0

31 May 20:40
Compare
Choose a tag to compare

Added

  • Support for writing an AntiSMASH sideload JSON file after a gecco run workflow.
  • Code for converting GenBank files in BiG-SLiCE compatible format with the gecco convert subcommand.
  • Documentation about using GECCO in combination with AntiSMASH or BiG-SLiCE.

Changed

  • Minimum Biopython version to v1.73 for compatibility with older bioinformatics tooling.
  • Internal domain composition shipped in the gecco.types with newer composition array obtained directly from MIBiG files.

Removed

  • Outdated notice about -vvv verbosity level in the help message of the main gecco command.

0.6.3

10 May 11:55
Compare
Choose a tag to compare

Fixed

  • HMMER annotation not properly handling inputs with multiple contigs.
  • Some progress bar totals displaying as floats in the CLI.

Changed

  • PyHMMER now sets the Z and domZ values from the number of proteins given to the search pipeline.
  • gecco.cli delegates imports to make CLI more responsive.
  • pkg_resources has been replaced with importlib.resources and importlib.metadata where applicable.
  • multiprocessing.cpu_count has been replaced with os.cpu_count where applicable.

0.6.2

04 May 18:07
Compare
Choose a tag to compare

Fixed

  • gecco cv loto crashing because of outdated code.

Changed

  • Logging-style prompt will only display if GECCO is running with -vv flag.

Added

  • GECCO bioRxiv paper reference to Cluster.to_seq_record output record.

0.6.1

15 Mar 15:18
Compare
Choose a tag to compare

Fixed

  • Progress bar not being disabled by -q flag in CLI.
  • Fallback to using HMM name if accession is not available in PyHMMER.
  • Group genes by source contig and process them separately in PyHMMER to avoid bogus E-values.

Added

  • psutil dependency to get the number of physical CPU cores on the host machine.
  • Support for using an arbitrary mapping of positives to negatives in gecco embed.

Removed

  • Unused and outdated HMMER and DomainRow classes from gecco.hmmer.