Skip to content

Commit

Permalink
RLS Version 1.5.0 SemiBin2 beta
Browse files Browse the repository at this point in the history
Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface.

USER-VISIBLE IMPROVEMENTS SINCE v1.4.0

- Added a new option for ORF finding, called `fast-naive` which is an internal very fast implementation.
- Added the possibility of bypassing ORF finding altogether by providing prodigal outputs directly (or any other gene prediction in the right format)
- Command line argument checking is more exhaustive instead of exiting at first error
- Added `--quiet` flag to reduce the amount of output printed
- Better `--help` (group required arguments separately)
- Add `--output-compression` option to compress outputs
- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](#123).
- Add contig->bin mapping table ([#123](#123))
- `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_)

```python
import SemiBin.main

...

SemiBin.main.main2(['single_easy_bin', '--input-fasta', ...])
```
  • Loading branch information
luispedro committed Jan 16, 2023
1 parent ac618b1 commit fc0d22c
Show file tree
Hide file tree
Showing 4 changed files with 30 additions and 12 deletions.
12 changes: 6 additions & 6 deletions ChangeLog
Original file line number Diff line number Diff line change
@@ -1,17 +1,17 @@
Unreleased
Version 1.5.0 (SemiBin2 beta) Jan 17 2023 by BigDataBiology
* Add `SemiBin2` script
* Added naive ORF finder
* Make command line arguments more flexible for --sequencing-type argument
* Add `--prodigal-output-faa` argument (#113)
* Make command line arguments more flexible for --sequencing-type argument
* Argument checking is more exhaustive instead of exiting at first error
* Add `--quiet` argument
* Better `--help` (group required arguments separately)
* Add `--compression` option
* Make SemiBin.main.main callable with a list of arguments
* Add `--tag-output` option
* Add contig->bin mapping table (#123)
* Better `--help` (group required arguments separately)
* Make SemiBin.main.main2 callable with a list of arguments
* Add contig -> bin mapping table (#123)

Version 1.4.0 Dec 2022 by BigDataBiology
Version 1.4.0 Dec 15 2022 by BigDataBiology
* Provide binning algorithm for assemblies from long read
* Add `--allow-missing-mmseqs2` flag to `check_install` subcommand
* Run Prodigal in multiple jobs without multiprocessing (#106)
Expand Down
2 changes: 1 addition & 1 deletion SemiBin/semibin_version.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = '1.4.0'
__version__ = '1.5.0'
19 changes: 16 additions & 3 deletions docs/semibin2.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,25 @@ They have the same functionality, but slightly different interfaces. The exact
interface to `SemiBin2` should be considered as unstable (while we will strive
to maintain backwards compatibility if you call the `SemiBin` script).

# Differences between SemiBin2 and SemiBin1
## Upgrading to SemiBin2

1. If you are using the `easy_*` workflows, then they will probably continue to
work exactly the same (except that you will get better results faster).
2. Outputs are now **always** in a directory called `output_bins`.
3. By default, bins are in file named as `SemiBin_{label}.fa.gz` (and
compressed with _gzip_ as the name indicates).

Points `2` and `3` may require some minor modifications to wrapper scripts.

## Longer list of differences between SemiBin2 and SemiBin1

The biggest different is that the default training mode is self-supervised mode.

- Output bins are now **always** in a directory called `output_bins` (in
- Output filenames are now anvi'o compatible (effectively, the default value of `--tag-output` is `SemiBin`) (see discussion in [#123](https://github.com/BigDataBiology/SemiBin/issues/123))
_SemiBin1_, it actually depended on which parameters were used)
- Output filenames are now anvi'o compatible (effectively, the default value of
`--tag-output` is `SemiBin`), see discussion at
[#123](https://github.com/BigDataBiology/SemiBin/issues/123).
- `--compression` defaults to `gz` (instead of `none`)
- ORF finder defaults to the `fast-naive` internal ORF finder
- `--write-pre-reclustering-bins` is `False` by default
Expand All @@ -24,5 +36,6 @@ The biggest different is that the default training mode is self-supervised mode.
A few arguments that were deprecated before are completely removed:
- `--recluster`: it did nothing already as reclustering is default
- `--mode`: Use `--train-from-many`
- `--training-type`: Use `--semi-supervised` to use semi-supervised learning (although that is also deprecated)
- `--training-type`: Use `--semi-supervised` to use semi-supervised learning
(although that is also deprecated)

9 changes: 7 additions & 2 deletions docs/whatsnew.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,11 @@
# What's New

## Unreleased github version
## Version 1.5.0 (SemiBin2 beta)

*Released Jan 17, 2023*

Big change is the addition of a `SemiBin2` script, which is still experimental, but should be a slightly nicer interface.
See [[upgrading to SemiBin2](semibin2)]

### User-visible improvements

Expand All @@ -10,7 +15,7 @@
- Added `--quiet` flag to reduce the amount of output printed
- Better `--help` (group required arguments separately)
- Add `--output-compression` option to compress outputs
- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible)
- Add `--tag-output` option which allows for control of the output filenames (and also makes the anvi'o compatible — see discussion at [#123](https://github.com/BigDataBiology/SemiBin/issues/123).
- Add contig->bin mapping table ([#123](https://github.com/BigDataBiology/SemiBin/issues/123))
- `SemiBin.main.main1` and `SemiBin.main.main2` can now be called as a function with command line arguments (`main1` corresponds to _SemiBin1_ and `main2` corresponds to _SemiBin2_)

Expand Down

0 comments on commit fc0d22c

Please sign in to comment.