Skip to content

Releases: NatLibFi/Annif

Annif 0.50

07 Dec 11:19
Compare
Choose a tag to compare

This release introduces a setting to use only a part of the input text for subject indexing: the new input_limit project parameter truncates the input text to the given character number. This can improve the quality of the suggestions as the beginning of a long document typically includes an abstract and introduction. The default value for input_limit is zero, which means that truncation is not performed.

Improvements include better handling of cached data in nn_ensemble training and optimization of memory usage in evaluation by using sparse matrices for suggested subjects. Many dependencies have been updated and a few minor issues fixed.

New features:
#446 Add a backend paratemer to limit input characters in suggest
#452 Apply the input_limit backend parameter to texts in train & learn

Improvements:
#441 Sparse subjects (credit @mo-fu)
#443/#444 Allow use of cached data after cancelled training of nn_ensemble backend

Maintenance:
#448 Upgrade dependencies
#445 Upgrade LMDB dependency from 0.98 to 1.0.0
#449 Resolve DeprecationWarning: change warn to warning

Bug fixes:
#447 Fix missing default params in pav and nn ensemble

Annif 0.49

30 Jul 12:34
Compare
Choose a tag to compare

This release introduces the hyperopt CLI command for hyperparameter optimization. Initially it can only be used for finding optimal ensemble weights. The Web UI now follows the same visual style as the annif.org website. There are also some improvements to CLI commands, memory optimizations and bug fixes.

New features:

  • #240/#321/#414 Hyperparameter optimization of ensemble weights

Improvements:

  • #424/#426 New style for Web UI
  • #430 Define short form for CLI options and fix some of their docstrings
  • #428 Memory optimization: Avoid double allocation of NumPy arrays in eval operation

Maintenance:

  • #437 Upgrade TensorFlow to version 2.3.0 (from 2.2.0)

Bug fixes:

  • #431 Problem parsing timestamps from Maui Server
  • #432 Make modification timestamps timezone-aware

Annif 0.48

29 Jun 07:55
Compare
Choose a tag to compare

This release brings a major upgrade of the fastText library, switching from the old fasttextmirror package to the new official fasttext Python bindings. The generation of fastText training files has been rewritten. The release also introduces an experimental feature to speed up model evaluation using multiprocessing; a --jobs N option can be used with the eval command to perform evaluation in N parallel jobs. Another new feature is the addition of project state details to project information listings (is a project trained or not, and timestamp of training). Also minor improvements and bug fixes are included.

New features:

Improvements:

Maintenance:

  • #411 Run Travis CI fastText tests on Python 3.7 instead of 3.6
  • #421 Pin SciPy to 1.4.1 as required by TensorFlow 2.2.0

Bug fixes:

  • #422 Assign first retrieved project to selected variable (credit: @mo-fu)
  • #419 WEB-UI: Remove empty entry from list of projects (credit: @mo-fu)
  • #357/#410 fastText training file incorrectly generated

Annif 0.47.1

12 May 14:29
Compare
Choose a tag to compare

This patch release installs Tensorflow 2.2 without GPU support (introduced by default in TF 2.1) as currently Annif does not benefit from the GPU support but it takes quite much disk space. This patch reduces the size of Annif's Docker image from 2.4 GB to 1.4 GB.

Annif 0.47

11 May 07:02
Compare
Choose a tag to compare

This release changes the Python version requirement to 3.6+ and drops the usage of Pipenv in development installations. The TensorFlow library has been upgraded to version 2.2, which means that all features are now supported also under Python 3.8.

The eval command is supplemented by introducing weighted subject average as a metric and the possibility to output metrics separately for each subject (thus allowing to explore e.g. how often a subject was suggested correctly); also some metrics are given more specified interpretation in the output.

Other changes include the possibility to display notation codes (when available) in web UI as well as minor improvements, bug fixes, and maintenance tasks.

New features:

Improvements:

  • #405/#403 Upgrade to TensorFlow 2.2, Python 3.6+, drop Pipenv
  • #395/#396 Don't give suggestions for empty input
  • #389/#401/#402 Improved error handling in maui backend
  • #399 Miscellaneous minor improvements for readthedocs builds

Maintenance:

  • #400 Dockerfiles reorg and cleanup
  • #407 Adding secrets needed by new Drone instance

Bug fixes:

  • #394 Fix click 7.1 compatibility in tests
  • #398 Fix silently failing readthedocs builds

Annif 0.46

17 Feb 10:57
Compare
Choose a tag to compare

This release includes improvements in training by reducing memory usage and adds the --cached option to train command to reuse the already preprocessed data from the previous run. Vocabulary management is improved by allowing to update the labels in an already existing vocabulary (renaming labels and removing subjects) without the need to retrain the project. Support of notation codes used in classifications (e.g. UDC or YKL) is added.

New features:

  • #342/#376 --cached option to reuse preprocessed training data
  • #274/#383 Retain subject IDs when loading vocabulary over existing one
  • #157/#385 Support for notation codes

Improvements:

  • #363/#381 Use LMDB to store vectors in nn_ensemble
  • #379 Use sparse vectors in PAV backend
  • #382 Fix sonarqube errors (code quality problems)

Bug fixes:

  • #386 Fix invalid "fasttext" package being installed
  • #384 Remove duplicated be param option in optimize CLI

Annif 0.45.3

17 Jan 11:12
Compare
Choose a tag to compare

This patch release includes the changes necessitated by the update of api.annif.org:

  • enabling https in Swagger
  • installing curl needed by Docker healthcheck

Annif 0.45.2

20 Dec 12:17
Compare
Choose a tag to compare

This bugfix release fixes a problem with the Maui backend that was introduced by the parameter overriding support in 0.45:

  • #372/#373: Adapt the Maui backend for parameter overriding

(Annif 0.45.1 was an intermediate patch release where a Docker image build issue was fixed, with no changes in the Python codebase)

Annif 0.45

17 Dec 13:41
Compare
Choose a tag to compare

This release includes a new omikuji backend to support tree-based extreme multilabel classification machine learning algorithms, which give a big improvement to the quality of the subject indexing results. The --backend-param/-p option is introduced to the CLI train and learn commands (previously that option was only available for suggest and eval); the option can be used to override the parameters from the config file. Also Python 3.8 support is introduced - however, the nn_ensemble backend requires TensorFlow 2.0, which is not yet available for Python 3.8. The Vowpal Wabbit ensemble backend has been removed, as the neural network ensemble has similar features and gives better results.

New features:

Bug fixes:

  • #369 Fix for spurious "analyzer setting is missing" errors under WSGI
  • #360/#361 Launching Gunicorn

Improvements/Maintenance:

  • #367 Disable unnecessary Drone build dryruns for pushes
  • #365 Remove vw_ensemble backend
  • #359 Refactor backend project
  • #358 Mauiserver dockerization

Annif 0.44

13 Nov 14:04
Compare
Choose a tag to compare

This release includes a new maui backend for integrating Annif with Maui Server, a REST service wrapper for the Maui tool that will replace the similar but more limited Maui Service. The eval command has been enhanced by adding the F1@5 metric (F1 score for the top 5 suggestions) that is commonly used for comparing algorithms. There are also small improvements to the nn_ensemble backend and some bug fixes.

New features

  • #269/#344/#352 Add Maui Server backend
  • #354 Always compute F1@5 metric when evaluating

Improvements

  • #355/#356 Support learn-epochs parameter in nn_ensemble backend

Bug fixes

  • #350/#351 Fail gracefully if trying to evaluate an empty corpus
  • #307/#353 Accept UTF-8 files with Byte Order Mark (BOM)