This repository contains some code that wraps the sentence_transformers
PyPI package.
This is a Hansken extraction plugin, read the docs for the SDK here.
It can apply any of the available sentence transformers on the chatMessage.message
field.
The plugin can be adapted to use different models, but it can also run on different fields of course; just change the matcher and the getter.
Note that it is recommended to always check a model's model card or README before actually using it.
The .sb
-files are Starboard Notebook files.
These files can be imported in the Code Notebooks that are included in the Expert UI of Hansken.
The notebooks are an example of how you can sort all chat messages in a case based on their similarity using hansken.py.
To run it, pull the latest copy to the place where Hansken is looking for plugins:
docker pull ghcr.io/netherlandsforensicinstitute/bert-embeddings:latest
Or clone (and modify) this repository and build your own copy using
git clone https://github.com/netherlandsforensicinstitute/bert-embeddings
cd bert-embeddings
build_plugin bert_embeddings.py . bert-embeddings