Releases: NatLibFi/Annif
Annif 0.50
This release introduces a setting to use only a part of the input text for subject indexing: the new input_limit
project parameter truncates the input text to the given character number. This can improve the quality of the suggestions as the beginning of a long document typically includes an abstract and introduction. The default value for input_limit
is zero, which means that truncation is not performed.
Improvements include better handling of cached data in nn_ensemble training and optimization of memory usage in evaluation by using sparse matrices for suggested subjects. Many dependencies have been updated and a few minor issues fixed.
New features:
#446 Add a backend paratemer to limit input characters in suggest
#452 Apply the input_limit backend parameter to texts in train & learn
Improvements:
#441 Sparse subjects (credit @mo-fu)
#443/#444 Allow use of cached data after cancelled training of nn_ensemble backend
Maintenance:
#448 Upgrade dependencies
#445 Upgrade LMDB dependency from 0.98 to 1.0.0
#449 Resolve DeprecationWarning: change warn to warning
Bug fixes:
#447 Fix missing default params in pav and nn ensemble
Annif 0.49
This release introduces the hyperopt CLI command for hyperparameter optimization. Initially it can only be used for finding optimal ensemble weights. The Web UI now follows the same visual style as the annif.org website. There are also some improvements to CLI commands, memory optimizations and bug fixes.
New features:
Improvements:
- #424/#426 New style for Web UI
- #430 Define short form for CLI options and fix some of their docstrings
- #428 Memory optimization: Avoid double allocation of NumPy arrays in eval operation
Maintenance:
- #437 Upgrade TensorFlow to version 2.3.0 (from 2.2.0)
Bug fixes:
Annif 0.48
This release brings a major upgrade of the fastText library, switching from the old fasttextmirror package to the new official fasttext Python bindings. The generation of fastText training files has been rewritten. The release also introduces an experimental feature to speed up model evaluation using multiprocessing; a --jobs N
option can be used with the eval
command to perform evaluation in N parallel jobs. Another new feature is the addition of project state details to project information listings (is a project trained or not, and timestamp of training). Also minor improvements and bug fixes are included.
New features:
- #65/#417/#418/#425 Evaluate documents in parallel
- #329/#415 Show project train state and modification time
Improvements:
- #290/#292/#409/#412 Upgrade fastText to official version 0.9.2 (credit: @mvsjober)
- #413 Upgrade to omikuji 0.3.x
Maintenance:
- #411 Run Travis CI fastText tests on Python 3.7 instead of 3.6
- #421 Pin SciPy to 1.4.1 as required by TensorFlow 2.2.0
Bug fixes:
Annif 0.47.1
This patch release installs Tensorflow 2.2 without GPU support (introduced by default in TF 2.1) as currently Annif does not benefit from the GPU support but it takes quite much disk space. This patch reduces the size of Annif's Docker image from 2.4 GB to 1.4 GB.
Annif 0.47
This release changes the Python version requirement to 3.6+ and drops the usage of Pipenv in development installations. The TensorFlow library has been upgraded to version 2.2, which means that all features are now supported also under Python 3.8.
The eval
command is supplemented by introducing weighted subject average as a metric and the possibility to output metrics separately for each subject (thus allowing to explore e.g. how often a subject was suggested correctly); also some metrics are given more specified interpretation in the output.
Other changes include the possibility to display notation codes (when available) in web UI as well as minor improvements, bug fixes, and maintenance tasks.
New features:
- #392 Evaluate samples: specify interpretation of metrics (credit: @Veldhoen)
- #391/#393 Evaluation per-subject (credit: @Veldhoen)
- #390/#397 Show notations in web UI results list
Improvements:
- #405/#403 Upgrade to TensorFlow 2.2, Python 3.6+, drop Pipenv
- #395/#396 Don't give suggestions for empty input
- #389/#401/#402 Improved error handling in maui backend
- #399 Miscellaneous minor improvements for readthedocs builds
Maintenance:
Bug fixes:
Annif 0.46
This release includes improvements in training by reducing memory usage and adds the --cached
option to train
command to reuse the already preprocessed data from the previous run. Vocabulary management is improved by allowing to update the labels in an already existing vocabulary (renaming labels and removing subjects) without the need to retrain the project. Support of notation codes used in classifications (e.g. UDC or YKL) is added.
New features:
- #342/#376 --cached option to reuse preprocessed training data
- #274/#383 Retain subject IDs when loading vocabulary over existing one
- #157/#385 Support for notation codes
Improvements:
- #363/#381 Use LMDB to store vectors in nn_ensemble
- #379 Use sparse vectors in PAV backend
- #382 Fix sonarqube errors (code quality problems)
Bug fixes:
Annif 0.45.3
This patch release includes the changes necessitated by the update of api.annif.org
:
- enabling
https
in Swagger - installing
curl
needed by Docker healthcheck
Annif 0.45.2
This bugfix release fixes a problem with the Maui backend that was introduced by the parameter overriding support in 0.45:
(Annif 0.45.1 was an intermediate patch release where a Docker image build issue was fixed, with no changes in the Python codebase)
Annif 0.45
This release includes a new omikuji backend to support tree-based extreme multilabel classification machine learning algorithms, which give a big improvement to the quality of the subject indexing results. The --backend-param/-p
option is introduced to the CLI train
and learn
commands (previously that option was only available for suggest
and eval
); the option can be used to override the parameters from the config file. Also Python 3.8 support is introduced - however, the nn_ensemble
backend requires TensorFlow 2.0, which is not yet available for Python 3.8. The Vowpal Wabbit ensemble backend has been removed, as the neural network ensemble has similar features and gives better results.
New features:
- #343/#366/#368/#371 Omikuji backend
- #250/#289 Support backend param option in train and learn commands
- #345/#370 Support for Python 3.8
Bug fixes:
Improvements/Maintenance:
Annif 0.44
This release includes a new maui
backend for integrating Annif with Maui Server, a REST service wrapper for the Maui tool that will replace the similar but more limited Maui Service. The eval
command has been enhanced by adding the F1@5 metric (F1 score for the top 5 suggestions) that is commonly used for comparing algorithms. There are also small improvements to the nn_ensemble backend and some bug fixes.
New features
Improvements
Bug fixes