Releases: mrjleo/fast-forward-indexes
Releases · mrjleo/fast-forward-indexes
Fast-Forward Indexes v0.7.0
Encoders
TransformerEncoder
class has been made more modular to make extending it easier.TASBEncoder
,ContrieverEncoder
, andBGEEncoder
classes have been added.- Defaults for the
model
argument have been added to Transformer encoder classes.
Misc
python-terrier
dependency has been increased to0.12
.- PyTerrier transformers now call
pyterrier.model.add_ranks
.
Fast-Forward Indexes v0.6.0
Toolchain
- Now uses
uv
for dependency management. - Linting and format checking using
ruff
has been enabled and configured. - Type checking using
pyright
has been enabled and configured.
Codebase
- Minimum supported Python version is now
3.10
. - Type hints have been modernized.
- Repository has been converted to src-layout.
- Docstrings have been converted to ReST format.
- Library is now
pyright
-compliant (standard mode). py.typed
marker has been added.
API changes
- All indexes and
Mode
are now imported fromfast_forward.index
. - All quantizers are now imported from
fast_forward.quantizer
. Indexer
is now imported fromfast_forward.util
.
Fast-Forward Indexes v0.5.1
Ranking.interpolate
andRanking.__add__
(+
operator) now treat missing scores in either ranking as zero.
Fast-Forward Indexes v0.5.0
Ranking operations
- Rankings now implement the
+
and*
operators. - Rankings can now be normalized via
Ranking.normalize
(min-max normalization). - Reciprocal rank fusion is now supported via
Ranking.rr_scores
.
Misc
Index.__call__
now accepts abatch_size
argument.
Fast-Forward Indexes v0.4.1
- Fixed a bug where
OnDiskIndex
would not respect theresize_min_val
argument properly. - Fixed a bug where
Indexer.from_dicts
would ignore the encoder batch size in some cases. - Minor updates in the documentation for
Indexer
.
Fast-Forward Indexes v0.4.0
Indexer
- Now supports transferring vectors from one index to another.
- Now supports automatically training a quantizer during indexing.
- The encoder has been made optional.
API changes
Indexer.index_dicts
has been renamed toIndexer.from_dicts
.Indexer
now takes abatch_size
and anencoder_batch_size
.Index.__call__
:early_stopping_intervals
has been renamed toearly_stopping_depths
.OnDiskIndex
:ds_buffer_size
has been renamed tomax_indexing_size
.OnDiskIndex.to_memory
:buffer_size
has been renamed tobatch_size
.util.create_coalesced_index
:buffer_size
has been renamed tobatch_size
.
Fast-Forward Indexes v0.3.1
- Optimized product quantization has been implemented via
fast_forward.quantizer.nanopq.NanoOPQ
. Index.quantizer
property has been added, allowing to attach a quantizer to an empty existing index.- Some outdated code snippets in the documentation have been fixed.
Fast-Forward Indexes v0.3.0
Index operations
- When calling
Index.add
, the sequencesdoc_ids
andpsg_ids
can now containNone
elements, as long as each vector has at least one ID. - Indexes (vectors and corresponding IDs) can now be iterated over using
Index.batch_iter
andIndex.__iter__
.
Vector quantization
- Indexes now support vector quantization via the
fast_forward.quantizer.Quantizer
interface. fast_forward.quantizer.nanopq.NanoPQ
implements product quantization based on nanopq.
Misc
- The default ranking mode has been changed to
MAXP
.
API changes
- The
dim
argument has been removed fromOnDiskIndex
andInMemoryIndex
. - The
dtype
argument has been removed fromOnDiskIndex
.
Fast-Forward Indexes v0.2.1
- Transformer-based encoders now use
torch.no_grad
- Requirements have been made more precise by fixing the major versions
- Minor optimizations for early stopping
- Minor fixes in the documentation
Fast-Forward Indexes v0.2.0
Index structures
- New:
OnDiskIndex
is based on HDF5 and can be accessed on-demand from disk - Indexes can now grow dynamically in size
Performance
- Data is now represented using pandas data frames internally
- Many operations have been vectorized to improve performance
- Early stopping now works in batches rather than per query
Misc
- New:
Indexer
class for indexing corpora - New: PyTerrier transformers are provided for scoring and interpolation using Fast-Forward indexes
API changes
Many parts of the API have changed. Some of the most important breaking changes:
- Scores are now computed using
Index.__call__
- Queries are not explicitly provided anymore but attached to the ranking
InMemoryIndex
objects cannot be saved to or loaded from disk anymore