From fc0d22c8bf3ed554dac43056458a7be583032538 Mon Sep 17 00:00:00 2001
From: Luis Pedro Coelho <luis@luispedro.org>
Date: Mon, 16 Jan 2023 18:42:58 +0100
Subject: [PATCH] RLS Version 1.5.0 SemiBin2 beta
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface.

USER-VISIBLE IMPROVEMENTS SINCE v1.4.0

- Added a new option for ORF finding, called `fast-naive` which is an internal very fast implementation.
- Added the possibility of bypassing ORF finding altogether by providing prodigal outputs directly (or any other gene prediction in the right format)
- Command line argument checking is more exhaustive instead of exiting at first error
- Added `--quiet` flag to reduce the amount of output printed
- Better `--help` (group required arguments separately)
- Add `--output-compression` option to compress outputs
- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](https://github.com/BigDataBiology/SemiBin/issues/123).
- Add contig->bin mapping table ([#123](https://github.com/BigDataBiology/SemiBin/issues/123))
- `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_)

```python
import SemiBin.main

...

SemiBin.main.main2(['single_easy_bin', '--input-fasta', ...])
```
---
 ChangeLog                  | 12 ++++++------
 SemiBin/semibin_version.py |  2 +-
 docs/semibin2.md           | 19 ++++++++++++++++---
 docs/whatsnew.md           |  9 +++++++--
 4 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index 9e7dc6c..7e1ad6a 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,17 +1,17 @@
-Unreleased
+Version 1.5.0 (SemiBin2 beta) Jan 17 2023 by BigDataBiology
 	* Add `SemiBin2` script
 	* Added naive ORF finder
-	* Make command line arguments more flexible for --sequencing-type argument
 	* Add `--prodigal-output-faa` argument (#113)
+	* Make command line arguments more flexible for --sequencing-type argument
 	* Argument checking is more exhaustive instead of exiting at first error
 	* Add `--quiet` argument
-	* Better `--help` (group required arguments separately)
 	* Add `--compression` option
-	* Make SemiBin.main.main callable with a list of arguments
 	* Add `--tag-output` option
-	* Add contig->bin mapping table (#123)
+	* Better `--help` (group required arguments separately)
+	* Make SemiBin.main.main2 callable with a list of arguments
+	* Add contig -> bin mapping table (#123)
 
-Version 1.4.0 Dec  2022 by BigDataBiology
+Version 1.4.0 Dec 15 2022 by BigDataBiology
 	* Provide binning algorithm for assemblies from long read
 	* Add `--allow-missing-mmseqs2` flag to `check_install` subcommand
 	* Run Prodigal in multiple jobs without multiprocessing (#106)
diff --git a/SemiBin/semibin_version.py b/SemiBin/semibin_version.py
index 96e3ce8..77f1c8e 100644
--- a/SemiBin/semibin_version.py
+++ b/SemiBin/semibin_version.py
@@ -1 +1 @@
-__version__ = '1.4.0'
+__version__ = '1.5.0'
diff --git a/docs/semibin2.md b/docs/semibin2.md
index 07be5ff..efa8523 100644
--- a/docs/semibin2.md
+++ b/docs/semibin2.md
@@ -7,13 +7,25 @@ They have the same functionality, but slightly different interfaces. The exact
 interface to `SemiBin2` should be considered as unstable (while we will strive
 to maintain backwards compatibility if you call the `SemiBin` script).
 
-# Differences between SemiBin2 and SemiBin1
+## Upgrading to SemiBin2
+
+1. If you are using the `easy_*` workflows, then they will probably continue to
+   work exactly the same (except that you will get better results faster).
+2. Outputs are now **always** in a directory called `output_bins`.
+3. By default, bins are in file named as `SemiBin_{label}.fa.gz` (and
+   compressed with _gzip_ as the name indicates).
+
+Points `2` and `3` may require some minor modifications to wrapper scripts.
+
+## Longer list of differences between SemiBin2 and SemiBin1
 
 The biggest different is that the default training mode is self-supervised mode.
 
 - Output bins are now **always** in a directory called `output_bins` (in
-- Output filenames are now anvi'o compatible (effectively, the default value of `--tag-output` is `SemiBin`) (see discussion in [#123](https://github.com/BigDataBiology/SemiBin/issues/123))
   _SemiBin1_, it actually depended on which parameters were used)
+- Output filenames are now anvi'o compatible (effectively, the default value of
+  `--tag-output` is `SemiBin`), see discussion at
+  [#123](https://github.com/BigDataBiology/SemiBin/issues/123).
 - `--compression` defaults to `gz` (instead of `none`)
 - ORF finder defaults to the `fast-naive` internal ORF finder
 - `--write-pre-reclustering-bins` is `False` by default
@@ -24,5 +36,6 @@ The biggest different is that the default training mode is self-supervised mode.
 A few arguments that were deprecated before are completely removed:
 - `--recluster`: it did nothing already as reclustering is default
 - `--mode`: Use `--train-from-many`
-- `--training-type`: Use `--semi-supervised` to use semi-supervised learning (although that is also deprecated)
+- `--training-type`: Use `--semi-supervised` to use semi-supervised learning
+  (although that is also deprecated)
 
diff --git a/docs/whatsnew.md b/docs/whatsnew.md
index 7a2145c..0cb42b4 100644
--- a/docs/whatsnew.md
+++ b/docs/whatsnew.md
@@ -1,6 +1,11 @@
 # What's New
 
-## Unreleased github version
+## Version 1.5.0 (SemiBin2 beta)
+
+*Released Jan 17, 2023*
+
+Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface.
+See [[upgrading to SemiBin2](semibin2)]
 
 ### User-visible improvements
 
@@ -10,7 +15,7 @@
 - Added `--quiet` flag to reduce the amount of output printed
 - Better `--help` (group required arguments separately)
 - Add `--output-compression` option to compress outputs
-- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible)
+- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](https://github.com/BigDataBiology/SemiBin/issues/123).
 - Add contig->bin mapping table ([#123](https://github.com/BigDataBiology/SemiBin/issues/123))
 - `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_)