From fc0d22c8bf3ed554dac43056458a7be583032538 Mon Sep 17 00:00:00 2001 From: Luis Pedro Coelho Date: Mon, 16 Jan 2023 18:42:58 +0100 Subject: [PATCH] RLS Version 1.5.0 SemiBin2 beta MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface. USER-VISIBLE IMPROVEMENTS SINCE v1.4.0 - Added a new option for ORF finding, called `fast-naive` which is an internal very fast implementation. - Added the possibility of bypassing ORF finding altogether by providing prodigal outputs directly (or any other gene prediction in the right format) - Command line argument checking is more exhaustive instead of exiting at first error - Added `--quiet` flag to reduce the amount of output printed - Better `--help` (group required arguments separately) - Add `--output-compression` option to compress outputs - Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](https://github.com/BigDataBiology/SemiBin/issues/123). - Add contig->bin mapping table ([#123](https://github.com/BigDataBiology/SemiBin/issues/123)) - `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_) ```python import SemiBin.main ... SemiBin.main.main2(['single_easy_bin', '--input-fasta', ...]) ``` --- ChangeLog | 12 ++++++------ SemiBin/semibin_version.py | 2 +- docs/semibin2.md | 19 ++++++++++++++++--- docs/whatsnew.md | 9 +++++++-- 4 files changed, 30 insertions(+), 12 deletions(-) diff --git a/ChangeLog b/ChangeLog index 9e7dc6c..7e1ad6a 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,17 +1,17 @@ -Unreleased +Version 1.5.0 (SemiBin2 beta) Jan 17 2023 by BigDataBiology * Add `SemiBin2` script * Added naive ORF finder - * Make command line arguments more flexible for --sequencing-type argument * Add `--prodigal-output-faa` argument (#113) + * Make command line arguments more flexible for --sequencing-type argument * Argument checking is more exhaustive instead of exiting at first error * Add `--quiet` argument - * Better `--help` (group required arguments separately) * Add `--compression` option - * Make SemiBin.main.main callable with a list of arguments * Add `--tag-output` option - * Add contig->bin mapping table (#123) + * Better `--help` (group required arguments separately) + * Make SemiBin.main.main2 callable with a list of arguments + * Add contig -> bin mapping table (#123) -Version 1.4.0 Dec 2022 by BigDataBiology +Version 1.4.0 Dec 15 2022 by BigDataBiology * Provide binning algorithm for assemblies from long read * Add `--allow-missing-mmseqs2` flag to `check_install` subcommand * Run Prodigal in multiple jobs without multiprocessing (#106) diff --git a/SemiBin/semibin_version.py b/SemiBin/semibin_version.py index 96e3ce8..77f1c8e 100644 --- a/SemiBin/semibin_version.py +++ b/SemiBin/semibin_version.py @@ -1 +1 @@ -__version__ = '1.4.0' +__version__ = '1.5.0' diff --git a/docs/semibin2.md b/docs/semibin2.md index 07be5ff..efa8523 100644 --- a/docs/semibin2.md +++ b/docs/semibin2.md @@ -7,13 +7,25 @@ They have the same functionality, but slightly different interfaces. The exact interface to `SemiBin2` should be considered as unstable (while we will strive to maintain backwards compatibility if you call the `SemiBin` script). -# Differences between SemiBin2 and SemiBin1 +## Upgrading to SemiBin2 + +1. If you are using the `easy_*` workflows, then they will probably continue to + work exactly the same (except that you will get better results faster). +2. Outputs are now **always** in a directory called `output_bins`. +3. By default, bins are in file named as `SemiBin_{label}.fa.gz` (and + compressed with _gzip_ as the name indicates). + +Points `2` and `3` may require some minor modifications to wrapper scripts. + +## Longer list of differences between SemiBin2 and SemiBin1 The biggest different is that the default training mode is self-supervised mode. - Output bins are now **always** in a directory called `output_bins` (in -- Output filenames are now anvi'o compatible (effectively, the default value of `--tag-output` is `SemiBin`) (see discussion in [#123](https://github.com/BigDataBiology/SemiBin/issues/123)) _SemiBin1_, it actually depended on which parameters were used) +- Output filenames are now anvi'o compatible (effectively, the default value of + `--tag-output` is `SemiBin`), see discussion at + [#123](https://github.com/BigDataBiology/SemiBin/issues/123). - `--compression` defaults to `gz` (instead of `none`) - ORF finder defaults to the `fast-naive` internal ORF finder - `--write-pre-reclustering-bins` is `False` by default @@ -24,5 +36,6 @@ The biggest different is that the default training mode is self-supervised mode. A few arguments that were deprecated before are completely removed: - `--recluster`: it did nothing already as reclustering is default - `--mode`: Use `--train-from-many` -- `--training-type`: Use `--semi-supervised` to use semi-supervised learning (although that is also deprecated) +- `--training-type`: Use `--semi-supervised` to use semi-supervised learning + (although that is also deprecated) diff --git a/docs/whatsnew.md b/docs/whatsnew.md index 7a2145c..0cb42b4 100644 --- a/docs/whatsnew.md +++ b/docs/whatsnew.md @@ -1,6 +1,11 @@ # What's New -## Unreleased github version +## Version 1.5.0 (SemiBin2 beta) + +*Released Jan 17, 2023* + +Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface. +See [[upgrading to SemiBin2](semibin2)] ### User-visible improvements @@ -10,7 +15,7 @@ - Added `--quiet` flag to reduce the amount of output printed - Better `--help` (group required arguments separately) - Add `--output-compression` option to compress outputs -- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible) +- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](https://github.com/BigDataBiology/SemiBin/issues/123). - Add contig->bin mapping table ([#123](https://github.com/BigDataBiology/SemiBin/issues/123)) - `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_)